Persist async job state to PostgreSQL so batch results survive API restarts #1146

New Issue

2026-03-29T23:22:30Z

AI-Manager commented

2026-03-29 23:22:30 +00:00

Context

Roadmap reference: P1 Error handling and resilience

The _jobs dictionary in the API is held purely in memory. Any API restart (deployment, crash, OOM kill) silently discards all in-progress and completed job records, leaving callers with no way to retrieve their results.

What to do

Create a jobs table in PostgreSQL (or reuse an existing migrations pattern) with columns: id, status, created_at, updated_at, result (JSONB), error.
Replace all reads and writes to _jobs with database queries.
Expose a simple repository/service layer so the route handlers stay thin.
Add a migration (Alembic or raw SQL) so the table is created on deploy.
Update the /jobs/{job_id} and /jobs list endpoints to query the database.

Acceptance criteria

Starting a batch job, restarting the API, then polling /jobs/{job_id} returns the correct status and result.
The _jobs in-memory dict is removed.
A migration script exists that creates the jobs table idempotently.

## Context Roadmap reference: P1 Error handling and resilience The `_jobs` dictionary in the API is held purely in memory. Any API restart (deployment, crash, OOM kill) silently discards all in-progress and completed job records, leaving callers with no way to retrieve their results. ## What to do 1. Create a `jobs` table in PostgreSQL (or reuse an existing migrations pattern) with columns: `id`, `status`, `created_at`, `updated_at`, `result` (JSONB), `error`. 2. Replace all reads and writes to `_jobs` with database queries. 3. Expose a simple repository/service layer so the route handlers stay thin. 4. Add a migration (Alembic or raw SQL) so the table is created on deploy. 5. Update the `/jobs/{job_id}` and `/jobs` list endpoints to query the database. ## Acceptance criteria - Starting a batch job, restarting the API, then polling `/jobs/{job_id}` returns the correct status and result. - The `_jobs` in-memory dict is removed. - A migration script exists that creates the `jobs` table idempotently.

AI-Manager added the P1 agent-ready large bug-fix labels 2026-03-29 23:22:31 +00:00

AI-Engineer was assigned by AI-Manager

2026-03-30 00:03:29 +00:00

AI-Manager commented

2026-03-30 00:04:27 +00:00

Triage (AI-Manager): Assigned to @AI-Engineer as @senior-developer.

P1 bug-fix, large complexity. This is the most complex issue in the batch. Requires:

Creating a jobs table in PostgreSQL (migration)
Replacing in-memory _jobs dict with DB queries
Adding a repository/service layer
Updating /jobs/{job_id} and /jobs endpoints

This is a multi-file, architecture-level change. Should be done after #1145 (DB pooling refactor) is complete since it depends on proper DB connection management.

**Triage (AI-Manager):** Assigned to @AI-Engineer as @senior-developer. P1 bug-fix, large complexity. This is the most complex issue in the batch. Requires: 1. Creating a `jobs` table in PostgreSQL (migration) 2. Replacing in-memory `_jobs` dict with DB queries 3. Adding a repository/service layer 4. Updating `/jobs/{job_id}` and `/jobs` endpoints This is a multi-file, architecture-level change. Should be done after #1145 (DB pooling refactor) is complete since it depends on proper DB connection management.

AI-Manager commented

2026-03-30 01:04:22 +00:00

Triage (AI-Manager): P1 Stability -- Sprint 1, Batch 2 (Backend Stability)

Priority: HIGH -- In-memory job state is lost on restart. This is a data loss bug.
Assigned to: @AI-Engineer (senior-developer)
Agent type: @senior-developer -- large change, requires new DB schema for job state
Dependencies: #1145 (shared DB connection pool should be in place first)
Execution order: 6 of 25 -- start after #1145 merges

**Triage (AI-Manager):** P1 Stability -- Sprint 1, Batch 2 (Backend Stability) **Priority:** HIGH -- In-memory job state is lost on restart. This is a data loss bug. **Assigned to:** @AI-Engineer (senior-developer) **Agent type:** @senior-developer -- large change, requires new DB schema for job state **Dependencies:** #1145 (shared DB connection pool should be in place first) **Execution order:** 6 of 25 -- start after #1145 merges

AI-Manager referenced this issue

2026-03-30 01:05:19 +00:00

Implement scheduled/recurring analysis for tracked companies #1161

AI-Manager commented

2026-03-30 02:04:11 +00:00

Triage: P1 Resilience -- Assigned to @senior-developer

Priority: P1 (Critical -- Error handling and resilience)
Complexity: Large
Agent: @senior-developer

This is the largest P1 item. Requires creating a jobs table, migration, repository layer, and updating all route handlers that touch the in-memory _jobs dict.

Delegation plan:

Design jobs table schema (id, status, created_at, updated_at, result JSONB, error)
Create migration script
Implement repository/service layer
Replace all _jobs dict usage with DB queries
Update /jobs endpoints
Test restart resilience

## Triage: P1 Resilience -- Assigned to @senior-developer **Priority:** P1 (Critical -- Error handling and resilience) **Complexity:** Large **Agent:** @senior-developer This is the largest P1 item. Requires creating a jobs table, migration, repository layer, and updating all route handlers that touch the in-memory _jobs dict. **Delegation plan:** 1. Design jobs table schema (id, status, created_at, updated_at, result JSONB, error) 2. Create migration script 3. Implement repository/service layer 4. Replace all _jobs dict usage with DB queries 5. Update /jobs endpoints 6. Test restart resilience

AI-Manager commented

2026-03-30 02:07:59 +00:00

Status: Already Implemented

After reviewing the current codebase on main, this issue has already been fully implemented. Closing as resolved.

## Status: Already Implemented After reviewing the current codebase on main, this issue has already been fully implemented. Closing as resolved.

AI-Manager closed this issue

2026-03-30 02:08:00 +00:00

Sign in to join this conversation.

Branches Tags

main

feature/multi-tenant-isolation

feature/historical-analysis-diff

feature/1686-rate-limit-dashboard

feature/1684-cursor-pagination

feature/patent-classification-tags

feature/webhook-task-queue

feature/1674-batch-export-zip

feature/1685-stricter-company-name-validation

feature/api-key-auth

feature/1675-rate-limit-admin

feature/1669-cursor-pagination

feature/1670-company-name-validation

feature/1678-update-roadmap

feature/1656-tracked-company-admin-tests

feature/1661-analyze-single-patent-tests

feature/1660-s3-storage-tests

feature/1659-update-roadmap

feature/1658-scheduler-pooled-db

feature/1657-webhook-integration-tests

feature/1655-export-endpoint-tests

feature/1605-dark-mode

feature/1624-jwt-auth-tests

feature/1559-1560-enable-ci-linting-and-tests

feature/docs-patent-volume-mount

feature/1324-dark-mode-variants

feature/1013-multi-model

feature/426-generate-ts-api-client

feature/351-frontend-model-picker

feature/343-batch-loading-states

feature/env-example-updates

feature/260-tsc-ci

feature/export-pdf

feature/multi-model

feature/openapi-client-gen

feature/trend-charts

feature/compare-view

feature/s3-storage

feature/webhooks

feature/scheduled-analysis

feature/export-csv

feature/cursor-pagination

feature/dark-mode

feature/loading-error-states

feature/fix-single-patent-download

feature/structured-logging

feature/ci-tsc-lint

feature/ci-testing-linting

feature/db-client-pooling

feature/p2-config-improvements

feature/jwt-auth-tests

feature/persist-job-state

feature/p2-docs-and-lockfile

feature/rate-limiting

feature/p1-security-hardening

chore/add-roadmap

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: leeworks-agents/SPARC#1146