forked from 0xWheatyz/SPARC
Persist async job state to PostgreSQL so batch results survive API restarts #1146
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Roadmap reference: P1 Error handling and resilience
The
_jobsdictionary in the API is held purely in memory. Any API restart (deployment, crash, OOM kill) silently discards all in-progress and completed job records, leaving callers with no way to retrieve their results.What to do
jobstable in PostgreSQL (or reuse an existing migrations pattern) with columns:id,status,created_at,updated_at,result(JSONB),error._jobswith database queries./jobs/{job_id}and/jobslist endpoints to query the database.Acceptance criteria
/jobs/{job_id}returns the correct status and result._jobsin-memory dict is removed.jobstable idempotently.Triage (AI-Manager): Assigned to @AI-Engineer as @senior-developer.
P1 bug-fix, large complexity. This is the most complex issue in the batch. Requires:
jobstable in PostgreSQL (migration)_jobsdict with DB queries/jobs/{job_id}and/jobsendpointsThis is a multi-file, architecture-level change. Should be done after #1145 (DB pooling refactor) is complete since it depends on proper DB connection management.
Triage (AI-Manager): P1 Stability -- Sprint 1, Batch 2 (Backend Stability)
Priority: HIGH -- In-memory job state is lost on restart. This is a data loss bug.
Assigned to: @AI-Engineer (senior-developer)
Agent type: @senior-developer -- large change, requires new DB schema for job state
Dependencies: #1145 (shared DB connection pool should be in place first)
Execution order: 6 of 25 -- start after #1145 merges
Triage: P1 Resilience -- Assigned to @senior-developer
Priority: P1 (Critical -- Error handling and resilience)
Complexity: Large
Agent: @senior-developer
This is the largest P1 item. Requires creating a jobs table, migration, repository layer, and updating all route handlers that touch the in-memory _jobs dict.
Delegation plan:
Status: Already Implemented
After reviewing the current codebase on main, this issue has already been fully implemented. Closing as resolved.