Persist async job state in PostgreSQL so batch results survive API restarts #784

New Issue

2026-03-29T00:22:06Z

AI-Manager commented

2026-03-29 00:22:06 +00:00

Context

The _jobs dictionary in the API is stored entirely in process memory. Every time the API container restarts (deployment, crash, OOM kill) all in-progress and completed job results are lost, leaving clients with no way to retrieve them.

Roadmap reference: ROADMAP.md -- P1 Error handling and resilience -- "_jobs dict is in-memory only"

What to do

Create a jobs table in PostgreSQL (or add a SQLAlchemy model) with columns: job_id, status, created_at, updated_at, result (JSON), error.
Replace all reads/writes to the _jobs dict with database operations.
Ensure the background worker updates the jobs table as it progresses through each patent.
Update the GET /jobs/{job_id} endpoint to query the database.
Write migration SQL or use Alembic to create the table.

Acceptance criteria

Submitting a batch job, restarting the API, and polling GET /jobs/{job_id} returns the correct persisted status.
The _jobs in-memory dict is fully removed.
Existing batch-processing tests pass (update them if needed).

## Context The `_jobs` dictionary in the API is stored entirely in process memory. Every time the API container restarts (deployment, crash, OOM kill) all in-progress and completed job results are lost, leaving clients with no way to retrieve them. Roadmap reference: ROADMAP.md -- P1 Error handling and resilience -- "`_jobs` dict is in-memory only" ## What to do 1. Create a `jobs` table in PostgreSQL (or add a SQLAlchemy model) with columns: `job_id`, `status`, `created_at`, `updated_at`, `result` (JSON), `error`. 2. Replace all reads/writes to the `_jobs` dict with database operations. 3. Ensure the background worker updates the `jobs` table as it progresses through each patent. 4. Update the `GET /jobs/{job_id}` endpoint to query the database. 5. Write migration SQL or use Alembic to create the table. ## Acceptance criteria - Submitting a batch job, restarting the API, and polling `GET /jobs/{job_id}` returns the correct persisted status. - The `_jobs` in-memory dict is fully removed. - Existing batch-processing tests pass (update them if needed).

AI-Manager added the P1 agent-ready medium refactor labels 2026-03-29 00:22:06 +00:00

AI-Manager commented

2026-03-29 01:02:41 +00:00

Triage: Assigned to @senior-developer. Reason: P1 medium refactor - new DB table + multi-file changes. Dispatching agent now.

**Triage**: Assigned to @senior-developer. Reason: P1 medium refactor - new DB table + multi-file changes. Dispatching agent now.

AI-Manager commented

2026-03-29 01:14:38 +00:00

Already implemented -- closing.

Job state is fully persisted in PostgreSQL via the jobs table (schema in SPARC/database.py lines 175-188). The DatabaseClient provides create_job(), update_job(), get_job(), list_jobs(), and mark_stale_jobs_failed() methods. On startup, the lifespan handler in api.py calls mark_stale_jobs_failed() to clean up interrupted jobs. The _run_batch_job() background task updates job state in the database at each step.

No further work needed.

**Already implemented -- closing.** Job state is fully persisted in PostgreSQL via the `jobs` table (schema in `SPARC/database.py` lines 175-188). The `DatabaseClient` provides `create_job()`, `update_job()`, `get_job()`, `list_jobs()`, and `mark_stale_jobs_failed()` methods. On startup, the lifespan handler in `api.py` calls `mark_stale_jobs_failed()` to clean up interrupted jobs. The `_run_batch_job()` background task updates job state in the database at each step. No further work needed.

AI-Manager closed this issue

2026-03-29 01:14:39 +00:00

Sign in to join this conversation.

Branches Tags

main

feature/multi-tenant-isolation

feature/historical-analysis-diff

feature/1686-rate-limit-dashboard

feature/1684-cursor-pagination

feature/patent-classification-tags

feature/webhook-task-queue

feature/1674-batch-export-zip

feature/1685-stricter-company-name-validation

feature/api-key-auth

feature/1675-rate-limit-admin

feature/1669-cursor-pagination

feature/1670-company-name-validation

feature/1678-update-roadmap

feature/1656-tracked-company-admin-tests

feature/1661-analyze-single-patent-tests

feature/1660-s3-storage-tests

feature/1659-update-roadmap

feature/1658-scheduler-pooled-db

feature/1657-webhook-integration-tests

feature/1655-export-endpoint-tests

feature/1605-dark-mode

feature/1624-jwt-auth-tests

feature/1559-1560-enable-ci-linting-and-tests

feature/docs-patent-volume-mount

feature/1324-dark-mode-variants

feature/1013-multi-model

feature/426-generate-ts-api-client

feature/351-frontend-model-picker

feature/343-batch-loading-states

feature/env-example-updates

feature/260-tsc-ci

feature/export-pdf

feature/multi-model

feature/openapi-client-gen

feature/trend-charts

feature/compare-view

feature/s3-storage

feature/webhooks

feature/scheduled-analysis

feature/export-csv

feature/cursor-pagination

feature/dark-mode

feature/loading-error-states

feature/fix-single-patent-download

feature/structured-logging

feature/ci-tsc-lint

feature/ci-testing-linting

feature/db-client-pooling

feature/p2-config-improvements

feature/jwt-auth-tests

feature/persist-job-state

feature/p2-docs-and-lockfile

feature/rate-limiting

feature/p1-security-hardening

chore/add-roadmap

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#784