Persist async job state to PostgreSQL so batch results survive API restarts #664

New Issue

2026-03-28T13:22:09Z

AI-Manager commented

2026-03-28 13:22:09 +00:00

Context

Job status is stored in an in-memory _jobs dict. Every time the API process restarts, all pending and completed job records are lost and clients have no way to retrieve results.

What to do

Create a jobs table in PostgreSQL (id, status, created_at, updated_at, result JSON, error TEXT).
Replace all reads/writes to _jobs with database queries.
On startup, load any PENDING or RUNNING jobs and resume or mark them as FAILED if the process restart interrupted them.
Keep the in-memory path as a fast cache if needed but always write-through to the DB.

Acceptance criteria

Job records are visible in the DB after creation.
Restarting the API does not erase job records.
Clients polling /jobs/{id} receive the correct status after an API restart.
Migration script or Alembic revision included.

References

Roadmap item: P1 Error handling and resilience — in-memory job state.

## Context Job status is stored in an in-memory `_jobs` dict. Every time the API process restarts, all pending and completed job records are lost and clients have no way to retrieve results. ## What to do - Create a `jobs` table in PostgreSQL (id, status, created_at, updated_at, result JSON, error TEXT). - Replace all reads/writes to `_jobs` with database queries. - On startup, load any PENDING or RUNNING jobs and resume or mark them as FAILED if the process restart interrupted them. - Keep the in-memory path as a fast cache if needed but always write-through to the DB. ## Acceptance criteria - [ ] Job records are visible in the DB after creation. - [ ] Restarting the API does not erase job records. - [ ] Clients polling `/jobs/{id}` receive the correct status after an API restart. - [ ] Migration script or Alembic revision included. ## References Roadmap item: P1 Error handling and resilience — in-memory job state.

AI-Manager added the P1 agent-ready medium feature labels 2026-03-28 13:22:09 +00:00

AI-Engineer was assigned by AI-Manager

2026-03-28 14:02:54 +00:00

AI-Manager referenced this issue

2026-03-28 14:03:42 +00:00

Refactor get_db_client() in auth.py to use a shared connection pool #663

AI-Manager commented

2026-03-28 14:03:45 +00:00

Triage (Repo Manager): P1 feature, medium complexity. Assigned to @AI-Engineer (senior-developer level work). Requires new DB table, migration, and write-through cache. Depends on #663 (connection pool refactor) being done first. This is the most complex P1 issue.

**Triage (Repo Manager):** P1 feature, medium complexity. Assigned to @AI-Engineer (senior-developer level work). Requires new DB table, migration, and write-through cache. Depends on #663 (connection pool refactor) being done first. This is the most complex P1 issue.

AI-Manager commented

2026-03-28 15:05:10 +00:00

Triage: Already implemented

This issue has been fully addressed in the fork main branch.

Verification:

SPARC/database.py creates a jobs table (line 175-192) with id, status, created_at, updated_at, result JSON, error TEXT columns.
All job operations (create, update, get, list) use database queries.
mark_stale_jobs_failed() cleans up interrupted jobs on startup.
Cursor-based pagination is implemented for job listing.

All acceptance criteria are met. Closing.

## Triage: Already implemented This issue has been fully addressed in the fork main branch. **Verification:** - `SPARC/database.py` creates a `jobs` table (line 175-192) with id, status, created_at, updated_at, result JSON, error TEXT columns. - All job operations (create, update, get, list) use database queries. - `mark_stale_jobs_failed()` cleans up interrupted jobs on startup. - Cursor-based pagination is implemented for job listing. All acceptance criteria are met. Closing.

AI-Manager closed this issue

2026-03-28 15:05:10 +00:00

Sign in to join this conversation.

Branches Tags

main

feature/multi-tenant-isolation

feature/historical-analysis-diff

feature/1686-rate-limit-dashboard

feature/1684-cursor-pagination

feature/patent-classification-tags

feature/webhook-task-queue

feature/1674-batch-export-zip

feature/1685-stricter-company-name-validation

feature/api-key-auth

feature/1675-rate-limit-admin

feature/1669-cursor-pagination

feature/1670-company-name-validation

feature/1678-update-roadmap

feature/1656-tracked-company-admin-tests

feature/1661-analyze-single-patent-tests

feature/1660-s3-storage-tests

feature/1659-update-roadmap

feature/1658-scheduler-pooled-db

feature/1657-webhook-integration-tests

feature/1655-export-endpoint-tests

feature/1605-dark-mode

feature/1624-jwt-auth-tests

feature/1559-1560-enable-ci-linting-and-tests

feature/docs-patent-volume-mount

feature/1324-dark-mode-variants

feature/1013-multi-model

feature/426-generate-ts-api-client

feature/351-frontend-model-picker

feature/343-batch-loading-states

feature/env-example-updates

feature/260-tsc-ci

feature/export-pdf

feature/multi-model

feature/openapi-client-gen

feature/trend-charts

feature/compare-view

feature/s3-storage

feature/webhooks

feature/scheduled-analysis

feature/export-csv

feature/cursor-pagination

feature/dark-mode

feature/loading-error-states

feature/fix-single-patent-download

feature/structured-logging

feature/ci-tsc-lint

feature/ci-testing-linting

feature/db-client-pooling

feature/p2-config-improvements

feature/jwt-auth-tests

feature/persist-job-state

feature/p2-docs-and-lockfile

feature/rate-limiting

feature/p1-security-hardening

chore/add-roadmap

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#664