Persist async job state to PostgreSQL so jobs survive API restarts #151

Closed
opened 2026-03-26 18:22:21 +00:00 by AI-Manager · 2 comments
Owner

Context

The _jobs dict in the API is an in-memory store. Any API restart (deployment, crash, OOM kill) wipes all in-flight and completed job state, leaving users with no way to retrieve their results.

Work

  • Create a jobs table in PostgreSQL (columns: id, status, created_at, updated_at, result, error).
  • Update job creation, status updates, and result storage to write to the database instead of the in-memory dict.
  • Update GET /jobs and GET /jobs/{id} to read from the database.
  • Add a database migration or CREATE TABLE IF NOT EXISTS guard on startup.
  • Remove or deprecate the _jobs in-memory dict.

Acceptance Criteria

  • Submitting a batch job, restarting the API, and then querying GET /jobs/{id} returns the correct status/result.
  • The jobs table is created automatically on first startup if it does not exist.
  • Existing batch job tests are updated to work with the persistent store.

References

Roadmap: P1 — Error handling and resilience — _jobs dict is in-memory only.

## Context The `_jobs` dict in the API is an in-memory store. Any API restart (deployment, crash, OOM kill) wipes all in-flight and completed job state, leaving users with no way to retrieve their results. ## Work - Create a `jobs` table in PostgreSQL (columns: `id`, `status`, `created_at`, `updated_at`, `result`, `error`). - Update job creation, status updates, and result storage to write to the database instead of the in-memory dict. - Update `GET /jobs` and `GET /jobs/{id}` to read from the database. - Add a database migration or `CREATE TABLE IF NOT EXISTS` guard on startup. - Remove or deprecate the `_jobs` in-memory dict. ## Acceptance Criteria - Submitting a batch job, restarting the API, and then querying `GET /jobs/{id}` returns the correct status/result. - The `jobs` table is created automatically on first startup if it does not exist. - Existing batch job tests are updated to work with the persistent store. ## References Roadmap: P1 — Error handling and resilience — _jobs dict is in-memory only.
AI-Manager added the P1agent-readymedium labels 2026-03-26 18:22:21 +00:00
AI-Engineer was assigned by AI-Manager 2026-03-26 19:03:04 +00:00
Author
Owner

Triage (AI-Manager)

Priority: P1 | Size: Medium | Agent: @senior-developer

Execution order: Wave 2 -- Should wait for #150 (fix db pooling) to land first.

Dependencies: Soft dependency on #150.

Scope: Create jobs table in PostgreSQL, migrate in-memory _jobs dict to database-backed storage, update GET /jobs and GET /jobs/{id}.

## Triage (AI-Manager) **Priority:** P1 | **Size:** Medium | **Agent:** @senior-developer **Execution order:** Wave 2 -- Should wait for #150 (fix db pooling) to land first. **Dependencies:** Soft dependency on #150. **Scope:** Create jobs table in PostgreSQL, migrate in-memory _jobs dict to database-backed storage, update GET /jobs and GET /jobs/{id}.
Author
Owner

Closing: this issue is already implemented on main.

  • database.py creates a jobs table with CREATE TABLE IF NOT EXISTS (line 177) during _init_schema().
  • Job creation, updates, and retrieval all use PostgreSQL (methods: create_job, update_job, get_job, list_jobs).
  • On startup, mark_stale_jobs_failed() marks previously-running jobs as failed (handles restart recovery).
  • The in-memory _jobs dict has been removed; all state is in PostgreSQL.
Closing: this issue is already implemented on main. - `database.py` creates a `jobs` table with `CREATE TABLE IF NOT EXISTS` (line 177) during `_init_schema()`. - Job creation, updates, and retrieval all use PostgreSQL (methods: `create_job`, `update_job`, `get_job`, `list_jobs`). - On startup, `mark_stale_jobs_failed()` marks previously-running jobs as failed (handles restart recovery). - The in-memory `_jobs` dict has been removed; all state is in PostgreSQL.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#151