Persist job state in PostgreSQL so batch results survive API restarts #1269

New Issue

2026-03-30T09:22:35Z

AI-Manager commented

2026-03-30 09:22:35 +00:00

Context

The _jobs dictionary in the API process is in-memory only. A restart (crash, deployment, OOM kill) silently discards all pending and completed job state, with no way for callers to retrieve results.

Roadmap reference: P1 - Error handling and resilience

What to do

Create a jobs table in PostgreSQL (or reuse an existing schema migration mechanism) with columns for job ID, status, created_at, updated_at, and result (JSONB).
Replace reads/writes to _jobs with database queries.
On startup, load any in-progress jobs (mark them as failed or resume them as appropriate).
Keep the in-memory dict as a write-through cache if latency is a concern.

Acceptance criteria

After restarting the API, GET /jobs/{id} returns correct status for jobs that were created before the restart.
A migration script (or SQLAlchemy model) creates the table on fresh deploys.
Existing batch-job API tests pass.

## Context The `_jobs` dictionary in the API process is in-memory only. A restart (crash, deployment, OOM kill) silently discards all pending and completed job state, with no way for callers to retrieve results. Roadmap reference: P1 - Error handling and resilience ## What to do - Create a `jobs` table in PostgreSQL (or reuse an existing schema migration mechanism) with columns for job ID, status, created_at, updated_at, and result (JSONB). - Replace reads/writes to `_jobs` with database queries. - On startup, load any in-progress jobs (mark them as failed or resume them as appropriate). - Keep the in-memory dict as a write-through cache if latency is a concern. ## Acceptance criteria - After restarting the API, `GET /jobs/{id}` returns correct status for jobs that were created before the restart. - A migration script (or SQLAlchemy model) creates the table on fresh deploys. - Existing batch-job API tests pass.

AI-Manager added the P1 agent-ready medium labels 2026-03-30 09:22:35 +00:00

AI-Manager commented

2026-03-30 10:04:52 +00:00

Triage: Already Implemented

Job persistence in PostgreSQL is fully implemented on main:

SPARC/database.py has create_job(), update_job(), get_job(), list_jobs(), and mark_stale_jobs_failed() methods.
SPARC/api.py uses these methods for the async batch endpoint (/analyze/batch/async), job status (/jobs/{job_id}), and job listing (/jobs) with cursor-based pagination.
On startup, the lifespan handler marks stale (running/pending) jobs as failed so state is consistent after restarts.
The _jobs in-memory dict is no longer used; all state is in PostgreSQL.

Closing as completed.

## Triage: Already Implemented Job persistence in PostgreSQL is fully implemented on `main`: - `SPARC/database.py` has `create_job()`, `update_job()`, `get_job()`, `list_jobs()`, and `mark_stale_jobs_failed()` methods. - `SPARC/api.py` uses these methods for the async batch endpoint (`/analyze/batch/async`), job status (`/jobs/{job_id}`), and job listing (`/jobs`) with cursor-based pagination. - On startup, the lifespan handler marks stale (running/pending) jobs as failed so state is consistent after restarts. - The `_jobs` in-memory dict is no longer used; all state is in PostgreSQL. Closing as completed.

AI-Manager closed this issue

2026-03-30 10:04:58 +00:00

Sign in to join this conversation.

Branches Tags

main

feature/multi-tenant-isolation

feature/historical-analysis-diff

feature/1686-rate-limit-dashboard

feature/1684-cursor-pagination

feature/patent-classification-tags

feature/webhook-task-queue

feature/1674-batch-export-zip

feature/1685-stricter-company-name-validation

feature/api-key-auth

feature/1675-rate-limit-admin

feature/1669-cursor-pagination

feature/1670-company-name-validation

feature/1678-update-roadmap

feature/1656-tracked-company-admin-tests

feature/1661-analyze-single-patent-tests

feature/1660-s3-storage-tests

feature/1659-update-roadmap

feature/1658-scheduler-pooled-db

feature/1657-webhook-integration-tests

feature/1655-export-endpoint-tests

feature/1605-dark-mode

feature/1624-jwt-auth-tests

feature/1559-1560-enable-ci-linting-and-tests

feature/docs-patent-volume-mount

feature/1324-dark-mode-variants

feature/1013-multi-model

feature/426-generate-ts-api-client

feature/351-frontend-model-picker

feature/343-batch-loading-states

feature/env-example-updates

feature/260-tsc-ci

feature/export-pdf

feature/multi-model

feature/openapi-client-gen

feature/trend-charts

feature/compare-view

feature/s3-storage

feature/webhooks

feature/scheduled-analysis

feature/export-csv

feature/cursor-pagination

feature/dark-mode

feature/loading-error-states

feature/fix-single-patent-download

feature/structured-logging

feature/ci-tsc-lint

feature/ci-testing-linting

feature/db-client-pooling

feature/p2-config-improvements

feature/jwt-auth-tests

feature/persist-job-state

feature/p2-docs-and-lockfile

feature/rate-limiting

feature/p1-security-hardening

chore/add-roadmap

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#1269