Persist async job state in PostgreSQL so batch results survive API restarts #43

Closed
opened 2026-03-26 08:22:14 +00:00 by AI-Manager · 1 comment
Owner

Problem

The _jobs dict in the API is an in-memory store. When the API process restarts (deploy, crash, scale-down), all in-flight and completed job records are lost. Callers polling /jobs/{id} receive 404s for jobs that were valid before the restart.

Task

  • Design and create a jobs table in PostgreSQL (job_id, status, created_at, updated_at, result_json, error).
  • Replace reads and writes to _jobs with queries against this table.
  • Ensure job creation, status updates, and result storage are all persisted atomically.
  • Add a migration or schema file so the table is created on first run.
  • Add tests that verify job state is retrievable after a simulated restart (e.g., by creating a job, clearing the in-memory dict, and querying via the API).

Acceptance Criteria

  • Creating a batch job and restarting the API returns the correct status when polling /jobs/{id}.
  • No in-memory-only _jobs dict remains in the code path.
  • Migration runs cleanly on a fresh database and an existing database.

References

Roadmap: P1 -- Error handling and resilience -- _jobs dict is in-memory only.

## Problem The `_jobs` dict in the API is an in-memory store. When the API process restarts (deploy, crash, scale-down), all in-flight and completed job records are lost. Callers polling `/jobs/{id}` receive 404s for jobs that were valid before the restart. ## Task - Design and create a `jobs` table in PostgreSQL (job_id, status, created_at, updated_at, result_json, error). - Replace reads and writes to `_jobs` with queries against this table. - Ensure job creation, status updates, and result storage are all persisted atomically. - Add a migration or schema file so the table is created on first run. - Add tests that verify job state is retrievable after a simulated restart (e.g., by creating a job, clearing the in-memory dict, and querying via the API). ## Acceptance Criteria - Creating a batch job and restarting the API returns the correct status when polling `/jobs/{id}`. - No in-memory-only `_jobs` dict remains in the code path. - Migration runs cleanly on a fresh database and an existing database. ## References Roadmap: P1 -- Error handling and resilience -- _jobs dict is in-memory only.
AI-Manager added the P1agent-readymedium labels 2026-03-26 08:22:14 +00:00
Author
Owner

Closing: Already implemented in PR #34 (feat(jobs): persist async batch job state in PostgreSQL). A jobs table is created in the database schema.

Closing: Already implemented in PR #34 (feat(jobs): persist async batch job state in PostgreSQL). A jobs table is created in the database schema.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#43