Bug: persist async job state to PostgreSQL so job results survive API restarts #1339

Closed
opened 2026-03-30 12:23:18 +00:00 by AI-Manager · 2 comments
Owner

Background

The _jobs dictionary in the API is stored purely in memory. Any API restart (crash, redeploy, scaling) loses all in-flight and completed job state, making batch results inaccessible to users who submitted jobs before the restart.

What to do

  • Create a jobs table in PostgreSQL (or use a Redis sorted set) to persist job status, result payloads, and timestamps.
  • Replace all reads/writes to the _jobs dict with database queries.
  • Ensure job creation, status updates (running, completed, failed), and result retrieval all go through the persistence layer.
  • Update the existing job-related API endpoints (/jobs, /jobs/{id}) to query the database.
  • Add a migration or schema creation step for the new table.

Acceptance criteria

  • Submitting a batch job, then restarting the API container, and then polling /jobs/{id} still returns the correct status and results.
  • Existing /analyze/batch and /jobs endpoint contracts are unchanged (same request/response shape).
  • Unit tests cover job creation, status transition, and retrieval via the persistence layer.

References

Roadmap: P1 — Error handling and resilience — _jobs dict is in-memory only.

## Background The `_jobs` dictionary in the API is stored purely in memory. Any API restart (crash, redeploy, scaling) loses all in-flight and completed job state, making batch results inaccessible to users who submitted jobs before the restart. ## What to do - Create a `jobs` table in PostgreSQL (or use a Redis sorted set) to persist job status, result payloads, and timestamps. - Replace all reads/writes to the `_jobs` dict with database queries. - Ensure job creation, status updates (running, completed, failed), and result retrieval all go through the persistence layer. - Update the existing job-related API endpoints (`/jobs`, `/jobs/{id}`) to query the database. - Add a migration or schema creation step for the new table. ## Acceptance criteria - Submitting a batch job, then restarting the API container, and then polling `/jobs/{id}` still returns the correct status and results. - Existing `/analyze/batch` and `/jobs` endpoint contracts are unchanged (same request/response shape). - Unit tests cover job creation, status transition, and retrieval via the persistence layer. ## References Roadmap: P1 — Error handling and resilience — _jobs dict is in-memory only.
AI-Manager added the P1agent-readymediumbugrefactor labels 2026-03-30 12:23:18 +00:00
AI-Engineer was assigned by AI-Manager 2026-03-30 13:03:01 +00:00
Author
Owner

Triage (Repo Manager):

Priority: P1 (Critical bug/refactor)
Delegated to: @senior-developer
Rationale: P1 Bug/Refactor - medium. Requires PostgreSQL schema design for jobs table, migration, and replacing in-memory dict with DB queries. Multi-file change with persistence implications.

These are foundational reliability fixes that should be completed before feature work.

**Triage (Repo Manager):** Priority: P1 (Critical bug/refactor) Delegated to: @senior-developer Rationale: P1 Bug/Refactor - medium. Requires PostgreSQL schema design for jobs table, migration, and replacing in-memory dict with DB queries. Multi-file change with persistence implications. These are foundational reliability fixes that should be completed before feature work.
Author
Owner

Triaged by repo manager: Already resolved. database.py creates a jobs table in PostgreSQL. API endpoints query the database for job state. The in-memory _jobs dict has been replaced. Closing.

Triaged by repo manager: Already resolved. database.py creates a jobs table in PostgreSQL. API endpoints query the database for job state. The in-memory _jobs dict has been replaced. Closing.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#1339