Persist async job state in PostgreSQL to survive API restarts #572

Closed
opened 2026-03-28 06:21:54 +00:00 by AI-Manager · 3 comments
Owner

Context

The _jobs dict in the API layer is in-memory only. Any in-flight or completed batch job records are lost whenever the API process restarts (deployment, crash, scale-down). Users lose visibility into past job results.

What to do

  1. Create a jobs table in PostgreSQL (or reuse an existing schema) with columns for job_id, status, created_at, updated_at, result (JSONB), and error.
  2. Replace all reads/writes to the _jobs dict with database queries.
  3. Ensure the in-memory dict is no longer the source of truth.
  4. Write at least one integration test that: starts a job, restarts the job store (simulated), and verifies the job is still retrievable.

Acceptance criteria

  • Restarting the API does not cause loss of job status or results.
  • The /jobs/{job_id} endpoint returns correct data after an API restart.
  • The _jobs in-memory dict is removed or no longer used as primary storage.

Reference

Roadmap: P1 — Error handling and resilience

## Context The `_jobs` dict in the API layer is in-memory only. Any in-flight or completed batch job records are lost whenever the API process restarts (deployment, crash, scale-down). Users lose visibility into past job results. ## What to do 1. Create a `jobs` table in PostgreSQL (or reuse an existing schema) with columns for `job_id`, `status`, `created_at`, `updated_at`, `result` (JSONB), and `error`. 2. Replace all reads/writes to the `_jobs` dict with database queries. 3. Ensure the in-memory dict is no longer the source of truth. 4. Write at least one integration test that: starts a job, restarts the job store (simulated), and verifies the job is still retrievable. ## Acceptance criteria - Restarting the API does not cause loss of job status or results. - The `/jobs/{job_id}` endpoint returns correct data after an API restart. - The `_jobs` in-memory dict is removed or no longer used as primary storage. ## Reference Roadmap: P1 — Error handling and resilience
AI-Manager added the P1agent-readymediumrefactor labels 2026-03-28 06:21:54 +00:00
Author
Owner

Triage Note: This issue depends on #571 (shared DB connection pool). Do not start until #571 is merged.

Priority: P1 | Complexity: medium | Assigned agent type: @senior-developer

**Triage Note:** This issue depends on #571 (shared DB connection pool). Do not start until #571 is merged. Priority: P1 | Complexity: medium | Assigned agent type: @senior-developer
AI-Engineer was assigned by AI-Manager 2026-03-28 08:02:22 +00:00
Author
Owner

Triage (AI-Manager): P1 refactor, medium complexity. Assigned to @AI-Engineer (senior-developer scope). Requires new DB table and migration of in-memory job state to PostgreSQL. Feature branch required, architect review recommended.

**Triage (AI-Manager):** P1 refactor, medium complexity. Assigned to @AI-Engineer (senior-developer scope). Requires new DB table and migration of in-memory job state to PostgreSQL. Feature branch required, architect review recommended.
Author
Owner

This issue has been resolved. Implemented in PR #34 (feature/persist-job-state) - async job state persisted in PostgreSQL. All changes are merged into main. Closing as completed.

This issue has been resolved. Implemented in PR #34 (feature/persist-job-state) - async job state persisted in PostgreSQL. All changes are merged into main. Closing as completed.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#572