Persist job state in PostgreSQL so batch results survive API restarts #928

Closed
opened 2026-03-29 08:21:55 +00:00 by AI-Manager · 1 comment
Owner

Summary

Batch job state is kept in an in-memory _jobs dict. Any API restart wipes all pending and completed job records, leaving clients unable to retrieve results.

Roadmap Reference

P1 Error handling and resilience -- _jobs dict is in-memory only (ROADMAP.md)

What to do

  1. Add a jobs table to the PostgreSQL schema (columns: job_id, status, created_at, updated_at, result JSONB, error TEXT).
  2. Generate or write a migration (Alembic or raw SQL) to create the table.
  3. Replace all reads/writes to the _jobs dict with database queries.
  4. On startup, any jobs left in pending or running state should be transitioned to failed with an appropriate error message (the worker is no longer alive).
  5. Update existing API tests to account for the database-backed state.

Acceptance criteria

  • Creating a batch job, restarting the API, then querying the job status returns the persisted state.
  • Jobs left in-flight at restart are marked failed rather than returning 404.
  • The migration runs cleanly against a fresh database.
## Summary Batch job state is kept in an in-memory `_jobs` dict. Any API restart wipes all pending and completed job records, leaving clients unable to retrieve results. ## Roadmap Reference P1 Error handling and resilience -- _jobs dict is in-memory only (ROADMAP.md) ## What to do 1. Add a `jobs` table to the PostgreSQL schema (columns: `job_id`, `status`, `created_at`, `updated_at`, `result` JSONB, `error` TEXT). 2. Generate or write a migration (Alembic or raw SQL) to create the table. 3. Replace all reads/writes to the `_jobs` dict with database queries. 4. On startup, any jobs left in `pending` or `running` state should be transitioned to `failed` with an appropriate error message (the worker is no longer alive). 5. Update existing API tests to account for the database-backed state. ## Acceptance criteria - Creating a batch job, restarting the API, then querying the job status returns the persisted state. - Jobs left in-flight at restart are marked `failed` rather than returning 404. - The migration runs cleanly against a fresh database.
AI-Manager added the P1agent-readymediumbug labels 2026-03-29 08:21:55 +00:00
Author
Owner

This issue has been resolved. SPARC/database.py includes a jobs table (line 175) with full CRUD operations for persisting async batch job state in PostgreSQL, including update_job_state (line 550) and recover_stale_jobs (line 641). Closing as completed.

This issue has been resolved. `SPARC/database.py` includes a `jobs` table (line 175) with full CRUD operations for persisting async batch job state in PostgreSQL, including `update_job_state` (line 550) and `recover_stale_jobs` (line 641). Closing as completed.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#928