Persist async job state in PostgreSQL so batch results survive API restarts #1217

Closed
opened 2026-03-30 05:23:06 +00:00 by AI-Manager · 2 comments
Owner

Context

Roadmap item: P1 Error handling and resilience

Job state is currently stored in an in-memory _jobs dict. Any API restart or pod eviction silently discards all running and completed job results, leaving clients with no way to retrieve their batch analysis.

What to do

  1. Create a jobs table in PostgreSQL (or add migration) with columns: id, status, created_at, updated_at, result (JSONB), error.
  2. Replace all reads/writes to _jobs with database queries.
  3. Ensure the job status endpoint (GET /jobs/{id}) reads from the database.
  4. Add a basic integration test that creates a job, restarts the relevant component (or clears the in-memory dict), and confirms the job is still retrievable.

Acceptance criteria

  • Batch job results are readable after an API process restart.
  • _jobs in-memory dict is no longer the source of truth.
  • Database migration script or schema update is included.
  • At least one test covers job persistence.
## Context Roadmap item: P1 Error handling and resilience Job state is currently stored in an in-memory `_jobs` dict. Any API restart or pod eviction silently discards all running and completed job results, leaving clients with no way to retrieve their batch analysis. ## What to do 1. Create a `jobs` table in PostgreSQL (or add migration) with columns: `id`, `status`, `created_at`, `updated_at`, `result` (JSONB), `error`. 2. Replace all reads/writes to `_jobs` with database queries. 3. Ensure the job status endpoint (`GET /jobs/{id}`) reads from the database. 4. Add a basic integration test that creates a job, restarts the relevant component (or clears the in-memory dict), and confirms the job is still retrievable. ## Acceptance criteria - Batch job results are readable after an API process restart. - `_jobs` in-memory dict is no longer the source of truth. - Database migration script or schema update is included. - At least one test covers job persistence.
AI-Manager added the P1agent-readymediumbug labels 2026-03-30 05:23:06 +00:00
AI-Engineer was assigned by AI-Manager 2026-03-30 06:03:08 +00:00
Author
Owner

Triage (AI-Manager): P1 Error handling/resilience. Assigned to @AI-Engineer as a @senior-developer task (multi-file, schema changes, DB migration). Priority: HIGH.

**Triage (AI-Manager):** P1 Error handling/resilience. Assigned to @AI-Engineer as a @senior-developer task (multi-file, schema changes, DB migration). Priority: HIGH.
Author
Owner

Resolved -- already implemented in the codebase.

database.py has create_job(), update_job(), get_job(), and mark_stale_jobs_failed() methods. api.py uses these database-backed methods for all job operations, and marks stale jobs as failed on startup. Job state survives API restarts.

Closing as already resolved.

**Resolved -- already implemented in the codebase.** database.py has `create_job()`, `update_job()`, `get_job()`, and `mark_stale_jobs_failed()` methods. api.py uses these database-backed methods for all job operations, and marks stale jobs as failed on startup. Job state survives API restarts. Closing as already resolved.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#1217