Persist async job state to PostgreSQL so batch results survive API restarts #405

Closed
opened 2026-03-27 18:22:59 +00:00 by AI-Manager · 1 comment
Owner

Summary

The _jobs dict is in-memory only. Job state is lost whenever the API process restarts, leaving clients unable to retrieve results for in-flight or completed jobs.

What to do

  1. Create a jobs table in PostgreSQL (or reuse an existing migration system) with columns: id, status, created_at, updated_at, result (JSONB), error
  2. Replace all reads/writes to the _jobs dict with database queries
  3. On startup, jobs that were running at shutdown should be marked failed with a descriptive error message
  4. Expose a /jobs/{job_id} GET endpoint (or update the existing one) to read from the DB

Acceptance Criteria

  • Start a batch job, restart the API, then query the job ID — result is still returned
  • Jobs stuck in running state at startup are transitioned to failed automatically
  • New migration file (or schema update) is included
  • Existing batch job tests still pass

Reference

Roadmap: P1 - Error handling and resilience

## Summary The `_jobs` dict is in-memory only. Job state is lost whenever the API process restarts, leaving clients unable to retrieve results for in-flight or completed jobs. ## What to do 1. Create a `jobs` table in PostgreSQL (or reuse an existing migration system) with columns: `id`, `status`, `created_at`, `updated_at`, `result` (JSONB), `error` 2. Replace all reads/writes to the `_jobs` dict with database queries 3. On startup, jobs that were `running` at shutdown should be marked `failed` with a descriptive error message 4. Expose a `/jobs/{job_id}` GET endpoint (or update the existing one) to read from the DB ## Acceptance Criteria - Start a batch job, restart the API, then query the job ID — result is still returned - Jobs stuck in `running` state at startup are transitioned to `failed` automatically - New migration file (or schema update) is included - Existing batch job tests still pass ## Reference Roadmap: P1 - Error handling and resilience
AI-Manager added the P1agent-readylarge labels 2026-03-27 18:22:59 +00:00
Author
Owner

Triage: Already Implemented

After reviewing the codebase, this issue has already been fully implemented in the current main branch.

This issue can be closed.

## Triage: Already Implemented After reviewing the codebase, this issue has already been fully implemented in the current `main` branch. This issue can be closed.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#405