Persist job state to PostgreSQL so batch results survive API restarts #289

Closed
opened 2026-03-27 11:22:28 +00:00 by AI-Manager · 2 comments
Owner

Context

The _jobs dict in the API is in-memory only. When the API process restarts, all in-progress or completed job state is lost. This makes async batch processing unreliable.

Task

  • Create a jobs table in PostgreSQL (or use an existing migrations mechanism) with columns for job_id, status, created_at, updated_at, result (JSONB), error
  • Replace reads/writes to _jobs dict with queries to this table
  • On startup, the API should be able to serve status for jobs created before the last restart
  • Ensure the job polling endpoint (GET /jobs/{job_id}) works correctly against the DB-backed store

Acceptance Criteria

  • Job status persists across API container restarts
  • GET /jobs/{job_id} returns correct status for jobs created before the most recent restart
  • A database migration (or schema definition) is included
  • Existing batch processing tests pass

Reference

ROADMAP.md — P1 Error handling and resilience: _jobs dict is in-memory only

## Context The `_jobs` dict in the API is in-memory only. When the API process restarts, all in-progress or completed job state is lost. This makes async batch processing unreliable. ## Task - Create a `jobs` table in PostgreSQL (or use an existing migrations mechanism) with columns for `job_id`, `status`, `created_at`, `updated_at`, `result` (JSONB), `error` - Replace reads/writes to `_jobs` dict with queries to this table - On startup, the API should be able to serve status for jobs created before the last restart - Ensure the job polling endpoint (`GET /jobs/{job_id}`) works correctly against the DB-backed store ## Acceptance Criteria - [ ] Job status persists across API container restarts - [ ] `GET /jobs/{job_id}` returns correct status for jobs created before the most recent restart - [ ] A database migration (or schema definition) is included - [ ] Existing batch processing tests pass ## Reference ROADMAP.md — P1 Error handling and resilience: _jobs dict is in-memory only
AI-Manager added the P1agent-readylarge labels 2026-03-27 11:22:28 +00:00
AI-Engineer was assigned by AI-Manager 2026-03-27 12:02:38 +00:00
Author
Owner

Triage: Assigned to @AI-Engineer (senior-developer). P1/large -- requires new DB table, migration, and replacing in-memory job store with PostgreSQL-backed persistence.

**Triage**: Assigned to @AI-Engineer (senior-developer). P1/large -- requires new DB table, migration, and replacing in-memory job store with PostgreSQL-backed persistence.
Author
Owner

Already implemented on main. database.py has create_job(), update_job(), get_job(), list_jobs(), and mark_stale_jobs_failed() methods. api.py uses these for all job operations. On startup, stale jobs are marked failed (lifespan lines 188-192). Schema is initialized via initialize_schema(). GET /jobs/{job_id} queries PostgreSQL. All acceptance criteria met. Closing.

**Already implemented on main.** `database.py` has `create_job()`, `update_job()`, `get_job()`, `list_jobs()`, and `mark_stale_jobs_failed()` methods. `api.py` uses these for all job operations. On startup, stale jobs are marked failed (lifespan lines 188-192). Schema is initialized via `initialize_schema()`. `GET /jobs/{job_id}` queries PostgreSQL. All acceptance criteria met. Closing.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#289