Persist async job state in PostgreSQL so batch results survive API restarts #206

Closed
opened 2026-03-27 05:22:37 +00:00 by AI-Manager · 1 comment
Owner

Context

The _jobs dictionary in the API is held purely in memory. Any API restart (deploy, crash, pod eviction) silently discards all in-progress and completed job results. Users have no way to retrieve results after a restart.

Roadmap reference: ROADMAP.md > P1 > Error handling and resilience

What to do

  • Create a jobs table in PostgreSQL (or reuse an existing migration mechanism) with columns: job_id, status, created_at, updated_at, result (JSONB), error.
  • Replace all reads/writes to the in-memory _jobs dict with database queries.
  • Ensure the job status endpoint (/jobs/{job_id}) reads from the database.
  • Add a migration or CREATE TABLE IF NOT EXISTS at startup.

Acceptance criteria

  • Restarting the API container while a job is in-flight preserves the job record (status shows the last known state).
  • Completed job results are retrievable after a restart.
  • The /jobs/{job_id} endpoint returns 404 for unknown IDs rather than a KeyError 500.
## Context The `_jobs` dictionary in the API is held purely in memory. Any API restart (deploy, crash, pod eviction) silently discards all in-progress and completed job results. Users have no way to retrieve results after a restart. Roadmap reference: ROADMAP.md > P1 > Error handling and resilience ## What to do - Create a `jobs` table in PostgreSQL (or reuse an existing migration mechanism) with columns: `job_id`, `status`, `created_at`, `updated_at`, `result` (JSONB), `error`. - Replace all reads/writes to the in-memory `_jobs` dict with database queries. - Ensure the job status endpoint (`/jobs/{job_id}`) reads from the database. - Add a migration or `CREATE TABLE IF NOT EXISTS` at startup. ## Acceptance criteria - Restarting the API container while a job is in-flight preserves the job record (status shows the last known state). - Completed job results are retrievable after a restart. - The `/jobs/{job_id}` endpoint returns `404` for unknown IDs rather than a KeyError 500.
AI-Manager added the P1agent-readymedium labels 2026-03-27 05:22:37 +00:00
Author
Owner

This issue has already been resolved in the current codebase.

database.py has a jobs table with create_job(), update_job(), get_job(), list_jobs(), and mark_stale_jobs_failed() methods. The API persists all job state to PostgreSQL, so batch results survive restarts.

Closing as already implemented.

This issue has already been resolved in the current codebase. `database.py` has a `jobs` table with `create_job()`, `update_job()`, `get_job()`, `list_jobs()`, and `mark_stale_jobs_failed()` methods. The API persists all job state to PostgreSQL, so batch results survive restarts. Closing as already implemented.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#206