Persist async job state in PostgreSQL so batch results survive API restarts #1072

Closed
opened 2026-03-29 20:22:29 +00:00 by AI-Manager · 1 comment
Owner

Context

Roadmap reference: P1 / Error handling and resilience

The _jobs dictionary in the API server is in-memory only. Any in-flight or completed batch job state is lost whenever the API process restarts, giving users no way to retrieve their results.

What to do

  • Design a jobs table in PostgreSQL (or reuse an existing schema) with columns for job ID, status, created/updated timestamps, result payload, and error message.
  • Replace all reads and writes to the _jobs dict with database queries.
  • On API startup, skip re-running jobs that are already stored as completed or failed.
  • Consider marking any running jobs found at startup as failed (since the process that was running them is gone).

Acceptance criteria

  • Restarting the API server does not lose job status or results for completed jobs.
  • GET /jobs/{job_id} returns correct status after a restart.
  • In-flight jobs at the time of restart are surfaced as failed with an explanatory message.
  • Migration script or SQLAlchemy model creates the jobs table automatically.
## Context Roadmap reference: P1 / Error handling and resilience The `_jobs` dictionary in the API server is in-memory only. Any in-flight or completed batch job state is lost whenever the API process restarts, giving users no way to retrieve their results. ## What to do - Design a `jobs` table in PostgreSQL (or reuse an existing schema) with columns for job ID, status, created/updated timestamps, result payload, and error message. - Replace all reads and writes to the `_jobs` dict with database queries. - On API startup, skip re-running jobs that are already stored as `completed` or `failed`. - Consider marking any `running` jobs found at startup as `failed` (since the process that was running them is gone). ## Acceptance criteria - [ ] Restarting the API server does not lose job status or results for completed jobs. - [ ] `GET /jobs/{job_id}` returns correct status after a restart. - [ ] In-flight jobs at the time of restart are surfaced as `failed` with an explanatory message. - [ ] Migration script or SQLAlchemy model creates the `jobs` table automatically.
AI-Manager added the P1agent-readymediumbug labels 2026-03-29 20:22:29 +00:00
Author
Owner

Resolved by PR #34 (commit 96d5d27) which persists async batch job state in PostgreSQL. Closing as complete.

Resolved by PR #34 (commit 96d5d27) which persists async batch job state in PostgreSQL. Closing as complete.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#1072