Files

T

agent-company 7c6eed8d72 Update ROADMAP.md to mark completed P1 and P2 items as done

Move seven completed items from the P1 and P2 sections into the
Completed section: in-memory jobs persistence, export endpoint tests,
tracked company admin tests, webhook integration tests, S3 storage
tests, auto-download path tests, and scheduler DatabaseClient refactor.

The P2 section now only lists the two genuinely open items: cursor-based
pagination (Issue #1669) and request validation (Issue #1670).

Closes leeworks-agents/SPARC#1678

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-18 21:29:14 +00:00

6.0 KiB

Raw Permalink Blame History

SPARC Roadmap

Semiconductor Patent & Analytics Report Core -- development priorities.

Current State

SPARC is a patent analysis platform with a working end-to-end pipeline: Python/FastAPI backend, React/TypeScript frontend, PostgreSQL for persistence and caching, Docker Compose for local development, and Gitea Actions CI/CD for image builds and testing. Core features include patent retrieval via SerpAPI, PDF parsing, LLM analysis via OpenRouter (multi-model: Claude, GPT-4o, Gemini, Llama), batch processing, JWT authentication, analytics dashboard with patent trend charts, scheduled recurring analysis with alerting, webhook notifications (Slack/Discord), CSV and PDF export, S3/MinIO storage, side-by-side company comparison, and dark mode.

Completed

Items that have been implemented and merged into main.

Security hardening

~~Rotate default JWT secret.~~ Startup check refuses to start with the default secret in non-development environments.
~~CORS allow-origins are hardcoded.~~ Allowed origins are now configurable via environment variable.
~~Database credentials in docker-compose.yml.~~ Compose references .env for sensitive values.

Error handling and resilience

~~get_db_client() creates a new DatabaseClient on every call.~~ Refactored to a shared pooled singleton initialized at startup.
~~No rate limiting on auth endpoints.~~ Rate limiting middleware added to /auth/login and /auth/register.

Test coverage

~~API tests bypass authentication.~~ JWT auth integration tests added (33 cases covering registration, login, protected routes, token refresh, and admin-only endpoints).
~~No test stage in CI.~~ Gitea Actions workflow now runs pytest and gates the build.
~~No linting or type checking in CI.~~ ruff (Python) and tsc --noEmit (TypeScript) added to CI pipeline.

Backend

~~Add structured logging.~~ Python logging module used throughout.
~~Make LLM model configurable.~~ MODEL environment variable accepted; multi-model support with per-analysis selection (GPT-4o, Gemini, Claude, Llama).
~~SERP cache TTL hardcoded.~~ SERP_CACHE_TTL_HOURS exposed as env var.
~~Patent PDF storage.~~ S3/MinIO object storage backend added alongside local filesystem. Volume mount requirement documented.
~~analyze_single_patent assumes local file.~~ Auto-download from cached metadata link integrated.
~~Patent.patent_id typed as int.~~ Fixed to str.

Frontend

~~No loading/error states.~~ Skeleton loaders and error states added to Batch and Analytics pages.
~~No dark mode.~~ Full dark mode support with theme-aware chart colors.
~~Missing lockfile.~~ package-lock.json committed.

Features (formerly P3)

~~Export analysis reports.~~ CSV and PDF export endpoints implemented.
~~Comparison view.~~ Side-by-side company patent portfolio comparison added.
~~Scheduled/recurring analysis.~~ APScheduler-based periodic re-analysis with configurable interval and change-threshold alerting.
~~Webhook/notification support.~~ Slack, Discord, and generic HTTP POST webhooks with retry logic.
~~Multi-model support.~~ Model picker in Analysis and Batch pages; backend allow-list validation.
~~Patent trend charts.~~ Filing frequency and category distribution visualizations added to Analytics page.
~~OpenAPI client generation.~~ TypeScript API client auto-generated from FastAPI spec with CI freshness check.

Resilience

~~_jobs dict is in-memory only.~~ Database-backed job persistence implemented using db.list_jobs() and mark_stale_jobs_failed(). The in-memory _jobs dict has been removed.

Test coverage (P1/P2)

~~Export endpoint tests.~~ Tests added for CSV and PDF export endpoints.
~~Tracked company admin endpoint tests.~~ Tests added for /admin/tracked CRUD endpoints and scheduler integration.
~~Webhook integration tests.~~ Tests added for retry logic, Slack/Discord payload format, and multi-URL dispatch.
~~S3/MinIO storage backend tests.~~ Unit tests added for the S3 backend (read, write, exists, delete, error handling).
~~analyze_single_patent auto-download path tests.~~ Tests added for the auto-download fallback (cache lookup, PDF download, FileNotFoundError).

Code quality

~~Scheduler creates its own DatabaseClient.~~ Refactored to use the application-level pooled get_db_client().

P1 -- High Priority

No outstanding P1 items. All previously listed items have been completed and moved to the Completed section above.

P2 -- Medium Priority

Improvements to the API surface.

API improvements

API pagination. The /analyze/batch endpoint needs cursor-based pagination for large result sets. The /jobs endpoint already has cursor pagination. (Issue #1669)
Request validation improvements. Add stricter input validation for company names (disallow special characters, enforce length limits). (Issue #1670)

P3 -- Nice to Have

Lower-urgency enhancements and future features.

Historical analysis diffing. Show what changed between two analysis runs for the same company, highlighting new patents and score shifts.
Patent classification tagging. Automatically tag patents by technology domain (AI, semiconductors, materials science) using LLM classification.
User-level API keys. Allow users to generate personal API keys for programmatic access without JWT token refresh.
Batch export. Export analysis results for multiple companies at once as a ZIP archive.
Rate limiting dashboard. Surface rate limit status and usage statistics in the admin panel.
Async webhook delivery. Move webhook delivery to a background task queue (e.g., Celery, arq) to avoid blocking the scheduler.
Multi-tenant support. Scope analysis results and tracked companies per user or organization.

Infrastructure and Deployment

Kubernetes manifests, Helm charts, and cluster-level concerns (MetalLB, storage, FluxCD sync) are tracked in the Talos repository. File infrastructure-related issues there, not here.

6.0 KiB Raw Permalink Blame History