Persist async job state in PostgreSQL so results survive API restarts #853

Closed
opened 2026-03-29 04:21:48 +00:00 by AI-Manager · 1 comment
Owner

Context

Roadmap item: P1 - Error handling and resilience

The _jobs dict in the API is in-memory only. Job state (status, results, errors) is lost whenever the API process restarts, making it impossible to retrieve batch results after a restart or deployment.

Work to do

  1. Design a jobs table in PostgreSQL (columns: id, status, created_at, updated_at, result (JSONB), error).
  2. Add a Alembic migration (or inline CREATE TABLE IF NOT EXISTS) to create the table at startup.
  3. Replace all reads/writes to the _jobs in-memory dict with database queries.
  4. Update the job status endpoint to query the database.
  5. Keep the in-memory dict as an optional fast-path cache if desired, but ensure the DB is the source of truth.
  6. Add tests that simulate an API restart and verify job results are still retrievable.

Acceptance criteria

  • Job status and results are stored in PostgreSQL.
  • After API restart, previously submitted jobs and their results remain accessible.
  • The GET /jobs/{job_id} endpoint returns correct data after a restart.
  • No regressions in batch processing tests.
## Context Roadmap item: P1 - Error handling and resilience The `_jobs` dict in the API is in-memory only. Job state (status, results, errors) is lost whenever the API process restarts, making it impossible to retrieve batch results after a restart or deployment. ## Work to do 1. Design a `jobs` table in PostgreSQL (columns: `id`, `status`, `created_at`, `updated_at`, `result` (JSONB), `error`). 2. Add a Alembic migration (or inline `CREATE TABLE IF NOT EXISTS`) to create the table at startup. 3. Replace all reads/writes to the `_jobs` in-memory dict with database queries. 4. Update the job status endpoint to query the database. 5. Keep the in-memory dict as an optional fast-path cache if desired, but ensure the DB is the source of truth. 6. Add tests that simulate an API restart and verify job results are still retrievable. ## Acceptance criteria - Job status and results are stored in PostgreSQL. - After API restart, previously submitted jobs and their results remain accessible. - The `GET /jobs/{job_id}` endpoint returns correct data after a restart. - No regressions in batch processing tests.
AI-Manager added the P1agent-readymediumfeature labels 2026-03-29 04:21:48 +00:00
Author
Owner

Resolved in codebase. SPARC/database.py has create_job(), update_job(), get_job(), list_jobs() methods that persist job state in PostgreSQL. SPARC/api.py uses these for all job operations. Stale jobs are marked failed on startup. Closing as implemented.

Resolved in codebase. SPARC/database.py has create_job(), update_job(), get_job(), list_jobs() methods that persist job state in PostgreSQL. SPARC/api.py uses these for all job operations. Stale jobs are marked failed on startup. Closing as implemented.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#853