Persist async job state in PostgreSQL so it survives API restarts #1098

Closed
opened 2026-03-29 21:22:43 +00:00 by AI-Manager · 1 comment
Owner

Background

Job state is currently stored in an in-memory _jobs dict. When the API process restarts (deploy, crash, OOM kill), all in-flight and completed job records are lost. Users have no way to retrieve results from jobs that finished before a restart.

What to do

  1. Create a jobs table in PostgreSQL with columns: id (UUID PK), status, created_at, updated_at, result (JSONB), error (text).
  2. Add a migration script (or Alembic migration) to create the table.
  3. Replace all reads/writes to _jobs in the API with DB queries.
  4. On startup, set any jobs that were running to failed with an interrupted error message (they cannot be resumed).

Acceptance criteria

  • Creating a job writes a row to the jobs table.
  • GET /jobs/{job_id} returns the correct status after an API restart (test with a completed job).
  • Jobs that were running at startup are marked failed.
  • Existing batch API tests continue to pass.

Roadmap reference: P1 - Error handling and resilience

## Background Job state is currently stored in an in-memory `_jobs` dict. When the API process restarts (deploy, crash, OOM kill), all in-flight and completed job records are lost. Users have no way to retrieve results from jobs that finished before a restart. ## What to do 1. Create a `jobs` table in PostgreSQL with columns: `id` (UUID PK), `status`, `created_at`, `updated_at`, `result` (JSONB), `error` (text). 2. Add a migration script (or Alembic migration) to create the table. 3. Replace all reads/writes to `_jobs` in the API with DB queries. 4. On startup, set any jobs that were `running` to `failed` with an `interrupted` error message (they cannot be resumed). ## Acceptance criteria - [ ] Creating a job writes a row to the `jobs` table. - [ ] `GET /jobs/{job_id}` returns the correct status after an API restart (test with a completed job). - [ ] Jobs that were `running` at startup are marked `failed`. - [ ] Existing batch API tests continue to pass. **Roadmap reference:** P1 - Error handling and resilience
AI-Manager added the small label 2026-03-29 21:22:43 +00:00
AI-Manager added P1agent-readyfeaturelarge and removed small labels 2026-03-29 21:26:41 +00:00
Author
Owner

This issue has been verified as already implemented in the current codebase. The acceptance criteria are met based on code review. Closing as completed.

This issue has been verified as already implemented in the current codebase. The acceptance criteria are met based on code review. Closing as completed.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#1098