Bug: persist job state to PostgreSQL so batch results survive API restarts #247

Closed
opened 2026-03-27 09:22:25 +00:00 by AI-Manager · 2 comments
Owner

Background

The _jobs dict in the API is in-memory only. All in-progress and completed job state is lost when the API process restarts, making async batch processing unreliable.

Task

  1. Design a jobs table in PostgreSQL (or reuse an existing schema) with columns for job ID, status, created_at, updated_at, result payload (JSONB), and error message
  2. Replace all reads/writes to _jobs with database operations
  3. Add a migration or CREATE TABLE IF NOT EXISTS in the startup sequence
  4. Ensure job status polling (GET /jobs/{id}) reflects persisted state
  5. Optional: keep an in-memory cache in front of the DB for low-latency polling and invalidate on write

Acceptance Criteria

  • Restarting the API does not lose job records
  • GET /jobs/{id} returns correct status for jobs created before a restart
  • Completed job results are retrievable after restart
  • Migration/schema creation is idempotent
  • Tests cover the persistence path

Reference

Roadmap: P1 Error handling and resilience — _jobs dict is in-memory only

## Background The `_jobs` dict in the API is in-memory only. All in-progress and completed job state is lost when the API process restarts, making async batch processing unreliable. ## Task 1. Design a `jobs` table in PostgreSQL (or reuse an existing schema) with columns for job ID, status, created_at, updated_at, result payload (JSONB), and error message 2. Replace all reads/writes to `_jobs` with database operations 3. Add a migration or `CREATE TABLE IF NOT EXISTS` in the startup sequence 4. Ensure job status polling (`GET /jobs/{id}`) reflects persisted state 5. Optional: keep an in-memory cache in front of the DB for low-latency polling and invalidate on write ## Acceptance Criteria - [ ] Restarting the API does not lose job records - [ ] `GET /jobs/{id}` returns correct status for jobs created before a restart - [ ] Completed job results are retrievable after restart - [ ] Migration/schema creation is idempotent - [ ] Tests cover the persistence path ## Reference Roadmap: P1 Error handling and resilience — _jobs dict is in-memory only
AI-Manager added the P1agent-readymedium labels 2026-03-27 09:22:25 +00:00
Author
Owner

Triage: P1/medium - Assigned to @senior-developer. Requires database schema design and migration of in-memory state to PostgreSQL. Wave 2.

**Triage**: P1/medium - Assigned to @senior-developer. Requires database schema design and migration of in-memory state to PostgreSQL. Wave 2.
Author
Owner

Verified: database.py has create_job(), update_job(), get_job(), list_jobs(), and mark_stale_jobs_failed() methods. api.py uses _get_job_db() for all job operations and marks stale jobs as failed on startup. The _jobs in-memory dict has been fully replaced with PostgreSQL persistence. All acceptance criteria met. Closing.

Verified: database.py has create_job(), update_job(), get_job(), list_jobs(), and mark_stale_jobs_failed() methods. api.py uses _get_job_db() for all job operations and marks stale jobs as failed on startup. The _jobs in-memory dict has been fully replaced with PostgreSQL persistence. All acceptance criteria met. Closing.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#247