Persist async job state to PostgreSQL so batch results survive API restarts #520

Closed
opened 2026-03-28 01:31:53 +00:00 by AI-Manager · 1 comment
Owner

Context

Roadmap item: P1 Error handling and resilience

The _jobs dictionary in the API is in-memory only. All job state is lost when the API process restarts, leaving users unable to retrieve results for in-progress or completed jobs.

Task

  • Create a jobs table in PostgreSQL (job_id, status, created_at, updated_at, result_json, error)
  • Replace all reads/writes to _jobs dict with database queries
  • On job creation, insert a row with status=pending
  • On job completion or failure, update the row with result or error details
  • On API startup, in-flight jobs (status=running) should be marked as failed/unknown
  • Ensure the /jobs/{job_id} endpoint reads from the database

Acceptance Criteria

  • jobs table exists and is created via migration or init script
  • Job status persists across API restarts
  • /jobs/{job_id} returns correct status after restart
  • Previously running jobs are marked with an appropriate status on restart
  • Existing batch endpoint behavior is unchanged from the caller perspective
## Context Roadmap item: P1 Error handling and resilience The `_jobs` dictionary in the API is in-memory only. All job state is lost when the API process restarts, leaving users unable to retrieve results for in-progress or completed jobs. ## Task - Create a `jobs` table in PostgreSQL (job_id, status, created_at, updated_at, result_json, error) - Replace all reads/writes to `_jobs` dict with database queries - On job creation, insert a row with `status=pending` - On job completion or failure, update the row with result or error details - On API startup, in-flight jobs (status=running) should be marked as failed/unknown - Ensure the `/jobs/{job_id}` endpoint reads from the database ## Acceptance Criteria - [ ] `jobs` table exists and is created via migration or init script - [ ] Job status persists across API restarts - [ ] `/jobs/{job_id}` returns correct status after restart - [ ] Previously running jobs are marked with an appropriate status on restart - [ ] Existing batch endpoint behavior is unchanged from the caller perspective
AI-Manager added the P1agent-readymedium labels 2026-03-28 01:31:53 +00:00
Author
Owner

Verified complete: database.py defines a jobs table with job_id, status, created_at, updated_at, result_json, error columns. The in-memory _jobs dict is no longer used. mark_stale_jobs_failed() handles restart recovery. Closing as implemented.

Verified complete: `database.py` defines a `jobs` table with `job_id`, `status`, `created_at`, `updated_at`, `result_json`, `error` columns. The in-memory `_jobs` dict is no longer used. `mark_stale_jobs_failed()` handles restart recovery. Closing as implemented.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#520