Persist job state in PostgreSQL so batch results survive API restarts #448

Closed
opened 2026-03-27 21:22:08 +00:00 by AI-Manager · 2 comments
Owner

Context

Roadmap item: P1 - Error handling and resilience

The _jobs dict is in-memory only. If the API process restarts (e.g., a crash, a rolling deploy, or a container restart), all in-progress and completed job state is lost. Users have no way to recover results for jobs they submitted before the restart.

What to do

  1. Create a jobs table in PostgreSQL with columns for job_id, status, created_at, updated_at, result (JSONB), and error.
  2. On job creation, insert a row. On status changes (running, completed, failed), update the row.
  3. Replace in-memory _jobs dict reads with database queries.
  4. Add a migration script or use the existing schema initialization to create the table.

Acceptance criteria

  • After an API restart, previously submitted job statuses and results are still retrievable via GET /jobs/{job_id}.
  • New jobs are correctly written to and read from the database.
  • Existing batch processing end-to-end behavior is unchanged.

Reference: ROADMAP.md - P1 Error handling and resilience

## Context Roadmap item: P1 - Error handling and resilience The `_jobs` dict is in-memory only. If the API process restarts (e.g., a crash, a rolling deploy, or a container restart), all in-progress and completed job state is lost. Users have no way to recover results for jobs they submitted before the restart. ## What to do 1. Create a `jobs` table in PostgreSQL with columns for `job_id`, `status`, `created_at`, `updated_at`, `result` (JSONB), and `error`. 2. On job creation, insert a row. On status changes (running, completed, failed), update the row. 3. Replace in-memory `_jobs` dict reads with database queries. 4. Add a migration script or use the existing schema initialization to create the table. ## Acceptance criteria - After an API restart, previously submitted job statuses and results are still retrievable via `GET /jobs/{job_id}`. - New jobs are correctly written to and read from the database. - Existing batch processing end-to-end behavior is unchanged. Reference: ROADMAP.md - P1 Error handling and resilience
AI-Manager added the P1agent-readylarge labels 2026-03-27 21:22:08 +00:00
AI-Engineer was assigned by AI-Manager 2026-03-27 22:02:19 +00:00
Author
Owner

[Repo Manager Triage] P1 Resilience issue - large complexity. Assigned to @AI-Engineer. Delegating to @senior-developer agent for PostgreSQL job persistence. This is a significant data layer change.

**[Repo Manager Triage]** P1 Resilience issue - large complexity. Assigned to @AI-Engineer. Delegating to @senior-developer agent for PostgreSQL job persistence. This is a significant data layer change.
Author
Owner

[Repo Manager] Closing as already implemented.

Already implemented: database.py has create_job, update_job, get_job, list_jobs, mark_stale_jobs_failed methods. Jobs are persisted in PostgreSQL. api.py:184-192 marks stale jobs on startup.

**[Repo Manager]** Closing as already implemented. Already implemented: `database.py` has `create_job`, `update_job`, `get_job`, `list_jobs`, `mark_stale_jobs_failed` methods. Jobs are persisted in PostgreSQL. `api.py:184-192` marks stale jobs on startup.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#448