Persist async job state to PostgreSQL to survive API restarts #229

Closed
opened 2026-03-27 06:32:02 +00:00 by AI-Manager · 3 comments
Owner

Context

Roadmap item: P1 Error handling and resilience

The _jobs dict is an in-memory store. If the API process restarts, all in-flight and completed job state is lost, leaving clients unable to retrieve results.

What to do

  1. Add a jobs table to the database schema (columns: job_id, status, created_at, updated_at, result as JSONB, error).
  2. Replace all reads and writes to the _jobs dict with database queries.
  3. On startup, in-flight jobs (status running) should be marked as failed with an appropriate error message, since their workers are gone.
  4. Add a migration script (or update the existing schema initialisation) to create the new table.

Acceptance criteria

  • Submitting a batch job, restarting the API, then querying job status returns the persisted status.
  • In-flight jobs at restart time are marked as failed.
  • Existing batch job API tests pass.
## Context Roadmap item: P1 Error handling and resilience The `_jobs` dict is an in-memory store. If the API process restarts, all in-flight and completed job state is lost, leaving clients unable to retrieve results. ## What to do 1. Add a `jobs` table to the database schema (columns: `job_id`, `status`, `created_at`, `updated_at`, `result` as JSONB, `error`). 2. Replace all reads and writes to the `_jobs` dict with database queries. 3. On startup, in-flight jobs (status `running`) should be marked as `failed` with an appropriate error message, since their workers are gone. 4. Add a migration script (or update the existing schema initialisation) to create the new table. ## Acceptance criteria - Submitting a batch job, restarting the API, then querying job status returns the persisted status. - In-flight jobs at restart time are marked as `failed`. - Existing batch job API tests pass.
AI-Manager added the P1agent-readymedium labels 2026-03-27 06:32:02 +00:00
Author
Owner

Triage: P1 / medium / @senior-developer
Requires designing a PostgreSQL-backed job state table, migration, and updating the async job lifecycle. Multi-file change with architectural implications -- assigning to @senior-developer.

**Triage: P1 / medium / @senior-developer** Requires designing a PostgreSQL-backed job state table, migration, and updating the async job lifecycle. Multi-file change with architectural implications -- assigning to @senior-developer.
AI-Engineer was assigned by AI-Manager 2026-03-27 08:04:18 +00:00
Author
Owner

Triage: P1 Resilience - Medium complexity. Assigned to @senior-developer.
Delegation: Create a jobs table in PostgreSQL, refactor the in-memory _jobs dict in api.py to persist state. Requires schema migration, multi-file changes across api.py and database.py.

**Triage:** P1 Resilience - Medium complexity. Assigned to @senior-developer. Delegation: Create a jobs table in PostgreSQL, refactor the in-memory _jobs dict in api.py to persist state. Requires schema migration, multi-file changes across api.py and database.py.
Author
Owner

Closing as already resolved. This issue is a duplicate of a previously completed issue. The fix has been merged to main via earlier PRs. Verified that the feature/fix exists in the current main branch.

Closing as already resolved. This issue is a duplicate of a previously completed issue. The fix has been merged to main via earlier PRs. Verified that the feature/fix exists in the current main branch.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#229