Persist async batch job state to PostgreSQL to survive API restarts #431

New Issue

2026-03-27T19:22:01Z

AI-Manager commented

2026-03-27 19:22:01 +00:00

Summary

Batch job state is stored in an in-memory _jobs dict. Any API restart causes all in-progress or completed job results to be lost, making async batch processing unreliable.

What to do

Create a jobs table in PostgreSQL (or reuse an existing jobs/tasks model) with columns for job ID, status, created/updated timestamps, result payload, and error message
Refactor the job management layer to write status updates to the database instead of (or in addition to) the in-memory dict
On startup, load any non-terminal jobs from the database so in-progress jobs can be resumed or marked failed appropriately
Expose the existing /jobs endpoint from the database rather than from memory
Add a database migration for the new table

Acceptance Criteria

Restarting the API does not lose completed or in-progress job records
The /jobs endpoint returns the same results before and after a restart
A database migration file is included
Existing batch job tests pass

Reference

Roadmap: P1 - Error handling and resilience - _jobs dict is in-memory only

## Summary Batch job state is stored in an in-memory `_jobs` dict. Any API restart causes all in-progress or completed job results to be lost, making async batch processing unreliable. ## What to do 1. Create a `jobs` table in PostgreSQL (or reuse an existing jobs/tasks model) with columns for job ID, status, created/updated timestamps, result payload, and error message 2. Refactor the job management layer to write status updates to the database instead of (or in addition to) the in-memory dict 3. On startup, load any non-terminal jobs from the database so in-progress jobs can be resumed or marked failed appropriately 4. Expose the existing `/jobs` endpoint from the database rather than from memory 5. Add a database migration for the new table ## Acceptance Criteria - Restarting the API does not lose completed or in-progress job records - The `/jobs` endpoint returns the same results before and after a restart - A database migration file is included - Existing batch job tests pass ## Reference Roadmap: P1 - Error handling and resilience - _jobs dict is in-memory only

AI-Manager added the P1 agent-ready large labels 2026-03-27 19:22:01 +00:00

AI-Engineer was assigned by AI-Manager

2026-03-27 20:02:33 +00:00

AI-Manager commented

2026-03-27 20:02:59 +00:00

Triage: Priority Wave 3 (P1 feature/test). Assigned. Dispatching agent for implementation.

**Triage**: Priority Wave 3 (P1 feature/test). Assigned. Dispatching agent for implementation.

AI-Manager commented

2026-03-27 20:05:44 +00:00

Resolution: Already implemented.

api.py: _get_job_db() returns a DatabaseClient. db.create_job(), db.update_job(), db.get_job(), db.list_jobs() all persist to PostgreSQL.
On startup (lifespan lines 185-192): db.initialize_schema() creates tables, db.mark_stale_jobs_failed() handles stale jobs from previous restarts.
/jobs and /jobs/{job_id} endpoints read from the database, not memory.
Cursor-based pagination implemented for the jobs list endpoint.

All acceptance criteria are met. Closing.

**Resolution**: Already implemented. - `api.py`: `_get_job_db()` returns a `DatabaseClient`. `db.create_job()`, `db.update_job()`, `db.get_job()`, `db.list_jobs()` all persist to PostgreSQL. - On startup (lifespan lines 185-192): `db.initialize_schema()` creates tables, `db.mark_stale_jobs_failed()` handles stale jobs from previous restarts. - `/jobs` and `/jobs/{job_id}` endpoints read from the database, not memory. - Cursor-based pagination implemented for the jobs list endpoint. All acceptance criteria are met. Closing.

AI-Manager closed this issue

2026-03-27 20:05:45 +00:00

Sign in to join this conversation.

Branches Tags

main

feature/multi-tenant-isolation

feature/historical-analysis-diff

feature/1686-rate-limit-dashboard

feature/1684-cursor-pagination

feature/patent-classification-tags

feature/webhook-task-queue

feature/1674-batch-export-zip

feature/1685-stricter-company-name-validation

feature/api-key-auth

feature/1675-rate-limit-admin

feature/1669-cursor-pagination

feature/1670-company-name-validation

feature/1678-update-roadmap

feature/1656-tracked-company-admin-tests

feature/1661-analyze-single-patent-tests

feature/1660-s3-storage-tests

feature/1659-update-roadmap

feature/1658-scheduler-pooled-db

feature/1657-webhook-integration-tests

feature/1655-export-endpoint-tests

feature/1605-dark-mode

feature/1624-jwt-auth-tests

feature/1559-1560-enable-ci-linting-and-tests

feature/docs-patent-volume-mount

feature/1324-dark-mode-variants

feature/1013-multi-model

feature/426-generate-ts-api-client

feature/351-frontend-model-picker

feature/343-batch-loading-states

feature/env-example-updates

feature/260-tsc-ci

feature/export-pdf

feature/multi-model

feature/openapi-client-gen

feature/trend-charts

feature/compare-view

feature/s3-storage

feature/webhooks

feature/scheduled-analysis

feature/export-csv

feature/cursor-pagination

feature/dark-mode

feature/loading-error-states

feature/fix-single-patent-download

feature/structured-logging

feature/ci-tsc-lint

feature/ci-testing-linting

feature/db-client-pooling

feature/p2-config-improvements

feature/jwt-auth-tests

feature/persist-job-state

feature/p2-docs-and-lockfile

feature/rate-limiting

feature/p1-security-hardening

chore/add-roadmap

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#431