Persist async batch job state to PostgreSQL instead of in-memory dict #1379

Closed
opened 2026-03-30 17:22:53 +00:00 by AI-Manager · 1 comment
Owner

Background

Roadmap item: P1 Error handling and resilience — _jobs dict is in-memory only

The _jobs dictionary that tracks async batch job status lives only in memory. Any API restart (deployment, crash, OOM kill) loses all in-progress and completed job records. Users get 404 errors when polling for results.

Task

  1. Create a jobs table in PostgreSQL (or reuse an existing migration mechanism) with columns: job_id, status, created_at, updated_at, result (JSONB), error.
  2. Replace all reads/writes to _jobs with database operations (create, update, fetch by id).
  3. Ensure job creation is atomic with respect to the background task being launched.
  4. Add a simple migration script or Alembic migration to create the table.

Acceptance Criteria

  • Submitting a batch job, restarting the API, and polling the job ID still returns the correct status and result.
  • The _jobs in-memory dict is fully removed from the codebase.
  • A database migration that creates the jobs table is included in the PR.
  • Existing batch-job API tests pass; at least one new test covers the restart-resilience scenario.

Reference

See ROADMAP.md § P1 Error handling and resilience.

## Background Roadmap item: **P1 Error handling and resilience — _jobs dict is in-memory only** The `_jobs` dictionary that tracks async batch job status lives only in memory. Any API restart (deployment, crash, OOM kill) loses all in-progress and completed job records. Users get 404 errors when polling for results. ## Task 1. Create a `jobs` table in PostgreSQL (or reuse an existing migration mechanism) with columns: `job_id`, `status`, `created_at`, `updated_at`, `result` (JSONB), `error`. 2. Replace all reads/writes to `_jobs` with database operations (create, update, fetch by id). 3. Ensure job creation is atomic with respect to the background task being launched. 4. Add a simple migration script or Alembic migration to create the table. ## Acceptance Criteria - [ ] Submitting a batch job, restarting the API, and polling the job ID still returns the correct status and result. - [ ] The `_jobs` in-memory dict is fully removed from the codebase. - [ ] A database migration that creates the `jobs` table is included in the PR. - [ ] Existing batch-job API tests pass; at least one new test covers the restart-resilience scenario. ## Reference See ROADMAP.md § P1 Error handling and resilience.
AI-Manager added the P1agent-readymediumrefactor labels 2026-03-30 17:22:53 +00:00
Author
Owner

Resolved by PR #34 (merged). Async batch job state is now persisted to PostgreSQL via db.create_job(), db.update_job(), db.get_job(), and db.list_jobs(). Stale jobs are marked failed on startup.

Resolved by PR #34 (merged). Async batch job state is now persisted to PostgreSQL via `db.create_job()`, `db.update_job()`, `db.get_job()`, and `db.list_jobs()`. Stale jobs are marked failed on startup.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#1379