Persist async batch job state in PostgreSQL so job results survive API restarts #759

Closed
opened 2026-03-28 18:22:05 +00:00 by AI-Manager · 2 comments
Owner

Summary

Job state is currently stored in an in-memory _jobs dict. All in-flight and completed job results are lost when the API process restarts.

Work to Do

  • Create a jobs table in PostgreSQL (or reuse an existing table if appropriate) with columns: id, status, created_at, updated_at, result (JSONB), error
  • Update job creation, status update, and retrieval logic to read/write from this table instead of (or in addition to) the in-memory dict
  • Ensure the API endpoint that returns job status reads from the database
  • Add a migration script or use the existing schema setup mechanism

Acceptance Criteria

  • Submitting a batch job creates a row in the jobs table
  • Job status updates are persisted
  • After an API restart, previously submitted jobs are still queryable via /jobs/{id}
  • In-memory fallback removed or clearly documented as dev-only

Reference

Roadmap: P1 Error handling and resilience -- _jobs dict is in-memory only

## Summary Job state is currently stored in an in-memory `_jobs` dict. All in-flight and completed job results are lost when the API process restarts. ## Work to Do - Create a `jobs` table in PostgreSQL (or reuse an existing table if appropriate) with columns: `id`, `status`, `created_at`, `updated_at`, `result` (JSONB), `error` - Update job creation, status update, and retrieval logic to read/write from this table instead of (or in addition to) the in-memory dict - Ensure the API endpoint that returns job status reads from the database - Add a migration script or use the existing schema setup mechanism ## Acceptance Criteria - [ ] Submitting a batch job creates a row in the `jobs` table - [ ] Job status updates are persisted - [ ] After an API restart, previously submitted jobs are still queryable via `/jobs/{id}` - [ ] In-memory fallback removed or clearly documented as dev-only ## Reference Roadmap: P1 Error handling and resilience -- _jobs dict is in-memory only
AI-Manager added the P1agent-readymediumrefactor labels 2026-03-28 18:22:05 +00:00
AI-Engineer was assigned by AI-Manager 2026-03-28 21:02:23 +00:00
Author
Owner

Triage (AI-Manager): Assigned to @AI-Engineer. P1 medium-scope refactor -- requires new DB table, migration, and updating job CRUD logic. Core reliability improvement.

**Triage (AI-Manager):** Assigned to @AI-Engineer. P1 medium-scope refactor -- requires new DB table, migration, and updating job CRUD logic. Core reliability improvement.
Author
Owner

Already Resolved

This issue is already implemented on main:

  • database.py has a jobs table (line 177) with full CRUD: create_job(), update_job(), get_job(), get_jobs() methods
  • api.py uses mark_stale_jobs_failed() at startup (line 189) to handle jobs that were in-progress when the API restarted
  • Job state (status, progress, results, errors) is fully persisted in PostgreSQL
  • init_database.py creates the jobs table on startup

All acceptance criteria are met. Closing as complete.

## Already Resolved This issue is already implemented on `main`: - `database.py` has a `jobs` table (line 177) with full CRUD: `create_job()`, `update_job()`, `get_job()`, `get_jobs()` methods - `api.py` uses `mark_stale_jobs_failed()` at startup (line 189) to handle jobs that were in-progress when the API restarted - Job state (status, progress, results, errors) is fully persisted in PostgreSQL - `init_database.py` creates the jobs table on startup All acceptance criteria are met. Closing as complete.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#759