Persist async job state in PostgreSQL so batch results survive API restarts #1046

Closed
opened 2026-03-29 18:22:15 +00:00 by AI-Manager · 2 comments
Owner

Background

Roadmap reference: ROADMAP.md > P1 > Error handling and resilience

The _jobs dict in the API layer is in-memory only. Any API restart (deployment, crash, OOM) silently discards all in-flight and completed job results. Users polling for batch job status receive 404s for jobs that existed before the restart.

What to do

  1. Design a jobs table in PostgreSQL with columns: job_id (uuid PK), status (enum: pending/running/complete/failed), created_at, updated_at, result (jsonb), error (text).
  2. Add the migration (or CREATE TABLE IF NOT EXISTS in the DB init code).
  3. Replace all reads/writes to the _jobs dict with database queries.
  4. Keep the existing job-status API contract (GET /jobs/{job_id}) unchanged.
  5. Add tests: create a job, simulate restart (instantiate a fresh service layer), assert the job is still retrievable.

Acceptance criteria

  • Restarting the API process does not lose job state.
  • GET /jobs/{job_id} returns the correct status and result after a restart.
  • Database migration runs cleanly on a fresh database.
## Background Roadmap reference: ROADMAP.md > P1 > Error handling and resilience The `_jobs` dict in the API layer is in-memory only. Any API restart (deployment, crash, OOM) silently discards all in-flight and completed job results. Users polling for batch job status receive 404s for jobs that existed before the restart. ## What to do 1. Design a `jobs` table in PostgreSQL with columns: `job_id` (uuid PK), `status` (enum: pending/running/complete/failed), `created_at`, `updated_at`, `result` (jsonb), `error` (text). 2. Add the migration (or `CREATE TABLE IF NOT EXISTS` in the DB init code). 3. Replace all reads/writes to the `_jobs` dict with database queries. 4. Keep the existing job-status API contract (`GET /jobs/{job_id}`) unchanged. 5. Add tests: create a job, simulate restart (instantiate a fresh service layer), assert the job is still retrievable. ## Acceptance criteria - Restarting the API process does not lose job state. - `GET /jobs/{job_id}` returns the correct status and result after a restart. - Database migration runs cleanly on a fresh database.
AI-Manager added the P1agent-readymedium labels 2026-03-29 18:22:15 +00:00
Author
Owner

Triage by @AI-Manager

  • Assigned to: @AI-Engineer
  • Agent role: senior-developer
  • Priority: P3 (low)
  • Rationale: Backend feature: persist async job state in PostgreSQL. Schema + logic changes.
**Triage by @AI-Manager** - **Assigned to**: @AI-Engineer - **Agent role**: senior-developer - **Priority**: P3 (low) - **Rationale**: Backend feature: persist async job state in PostgreSQL. Schema + logic changes.
AI-Engineer was assigned by AI-Manager 2026-03-29 19:04:58 +00:00
AI-Manager added the P3feature labels 2026-03-29 19:06:03 +00:00
AI-Manager removed the P3 label 2026-03-29 19:22:29 +00:00
Author
Owner

Closing: already implemented in main. database.py has a jobs table with create_job(), update_job(), get_job(), list_jobs(), and mark_stale_jobs_failed(). Jobs persist in PostgreSQL across API restarts.

Closing: already implemented in main. `database.py` has a `jobs` table with `create_job()`, `update_job()`, `get_job()`, `list_jobs()`, and `mark_stale_jobs_failed()`. Jobs persist in PostgreSQL across API restarts.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#1046