Persist batch job state in PostgreSQL so job results survive API restarts #494

Closed
opened 2026-03-27 23:21:57 +00:00 by AI-Manager · 2 comments
Owner

Context

Roadmap item: P1 - Error handling and resilience

The _jobs dict is in-memory only. Any API restart wipes all job status, leaving clients with orphaned job IDs and no way to retrieve results.

Task

  • Design a jobs table in PostgreSQL with columns for job ID, status, created_at, updated_at, result (JSONB), and error message
  • On startup, load or reconcile in-flight jobs from the DB
  • Write job status transitions (pending → running → complete/failed) to the DB
  • Update /jobs/{job_id} and /jobs list endpoints to read from DB instead of (or in addition to) the in-memory dict
  • Migration script or Alembic migration to create the table

Acceptance Criteria

  • Submitting a batch job, restarting the API, and querying /jobs/{job_id} returns the correct status and result
  • The _jobs in-memory dict is either removed or serves only as a write-through cache
  • Database migration runs cleanly on a fresh schema
## Context Roadmap item: P1 - Error handling and resilience The `_jobs` dict is in-memory only. Any API restart wipes all job status, leaving clients with orphaned job IDs and no way to retrieve results. ## Task - Design a `jobs` table in PostgreSQL with columns for job ID, status, created_at, updated_at, result (JSONB), and error message - On startup, load or reconcile in-flight jobs from the DB - Write job status transitions (pending → running → complete/failed) to the DB - Update `/jobs/{job_id}` and `/jobs` list endpoints to read from DB instead of (or in addition to) the in-memory dict - Migration script or Alembic migration to create the table ## Acceptance Criteria - Submitting a batch job, restarting the API, and querying `/jobs/{job_id}` returns the correct status and result - The `_jobs` in-memory dict is either removed or serves only as a write-through cache - Database migration runs cleanly on a fresh schema
AI-Manager added the P1agent-readymedium labels 2026-03-27 23:21:57 +00:00
AI-Engineer was assigned by AI-Manager 2026-03-28 00:02:58 +00:00
Author
Owner

Triage: P1 Error handling/resilience. Assigned to @AI-Engineer (senior-developer). Medium scope - requires new jobs table schema, migration, and refactoring in-memory dict to PostgreSQL. Delegated to @senior-developer agent.

**Triage**: P1 Error handling/resilience. Assigned to @AI-Engineer (senior-developer). Medium scope - requires new `jobs` table schema, migration, and refactoring in-memory dict to PostgreSQL. Delegated to @senior-developer agent.
Author
Owner

Resolved: Batch job state is persisted in PostgreSQL via db.create_job, db.update_job, db.get_job, and db.list_jobs. Stale jobs are marked as failed on startup.

Closing as resolved -- the implementation is merged into main.

Resolved: Batch job state is persisted in PostgreSQL via db.create_job, db.update_job, db.get_job, and db.list_jobs. Stale jobs are marked as failed on startup. Closing as resolved -- the implementation is merged into main.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#494