Move all completed items (security hardening, structured logging, dark mode, export, webhooks, scheduled analysis, multi-model, trend charts, CI, etc.) into a new Completed section. Reorganize remaining P1/P2/P3 items to reflect current priorities. Add new next-horizon items: historical diffing, patent classification tagging, user API keys, batch export, and multi-tenant support. Closes leeworks-agents/SPARC#1659 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6.3 KiB
SPARC Roadmap
Semiconductor Patent & Analytics Report Core -- development priorities.
Current State
SPARC is a patent analysis platform with a working end-to-end pipeline: Python/FastAPI backend, React/TypeScript frontend, PostgreSQL for persistence and caching, Docker Compose for local development, and Gitea Actions CI/CD for image builds and testing. Core features include patent retrieval via SerpAPI, PDF parsing, LLM analysis via OpenRouter (multi-model: Claude, GPT-4o, Gemini, Llama), batch processing, JWT authentication, analytics dashboard with patent trend charts, scheduled recurring analysis with alerting, webhook notifications (Slack/Discord), CSV and PDF export, S3/MinIO storage, side-by-side company comparison, and dark mode.
Completed
Items that have been implemented and merged into main.
Security hardening
Rotate default JWT secret.Startup check refuses to start with the default secret in non-development environments.CORS allow-origins are hardcoded.Allowed origins are now configurable via environment variable.Database credentials in docker-compose.yml.Compose references.envfor sensitive values.
Error handling and resilience
Refactored to a shared pooled singleton initialized at startup.get_db_client()creates a newDatabaseClienton every call.No rate limiting on auth endpoints.Rate limiting middleware added to/auth/loginand/auth/register.
Test coverage
API tests bypass authentication.JWT auth integration tests added (33 cases covering registration, login, protected routes, token refresh, and admin-only endpoints).No test stage in CI.Gitea Actions workflow now runspytestand gates the build.No linting or type checking in CI.ruff(Python) andtsc --noEmit(TypeScript) added to CI pipeline.
Backend
Add structured logging.Pythonloggingmodule used throughout.Make LLM model configurable.MODELenvironment variable accepted; multi-model support with per-analysis selection (GPT-4o, Gemini, Claude, Llama).SERP cache TTL hardcoded.SERP_CACHE_TTL_HOURSexposed as env var.Patent PDF storage.S3/MinIO object storage backend added alongside local filesystem. Volume mount requirement documented.Auto-download from cached metadata link integrated.analyze_single_patentassumes local file.Fixed toPatent.patent_idtyped asint.str.
Frontend
No loading/error states.Skeleton loaders and error states added to Batch and Analytics pages.No dark mode.Full dark mode support with theme-aware chart colors.Missing lockfile.package-lock.jsoncommitted.
Features (formerly P3)
Export analysis reports.CSV and PDF export endpoints implemented.Comparison view.Side-by-side company patent portfolio comparison added.Scheduled/recurring analysis.APScheduler-based periodic re-analysis with configurable interval and change-threshold alerting.Webhook/notification support.Slack, Discord, and generic HTTP POST webhooks with retry logic.Multi-model support.Model picker in Analysis and Batch pages; backend allow-list validation.Patent trend charts.Filing frequency and category distribution visualizations added to Analytics page.OpenAPI client generation.TypeScript API client auto-generated from FastAPI spec with CI freshness check.
P1 -- High Priority
These items address correctness, reliability, and coverage gaps that should be resolved before broader production use.
Resilience
_jobsdict is in-memory only. Job state is lost on API restart. Persist job status in PostgreSQL or Redis so async batch results survive restarts.
Test coverage gaps
- Export endpoint tests. The CSV and PDF export endpoints (
/export/) lack test coverage. Add tests covering auth, success, 404, and edge cases. (Issue #1655) - Tracked company admin endpoint tests. The
/admin/trackedCRUD endpoints and scheduler integration lack test coverage. (Issue #1656)
P2 -- Medium Priority
Improvements to reliability, test coverage, and code quality.
Test coverage
- Webhook integration tests. The retry logic, Slack/Discord payload
format, and multi-URL dispatch in
webhooks.pyneed test coverage. (Issue #1657) - S3/MinIO storage backend tests.
storage.pyhas local filesystem tests but no unit tests for the S3 backend (read, write, exists, delete, error handling). (Issue #1660) analyze_single_patentauto-download path tests. The auto-download fallback (cache lookup, PDF download, FileNotFoundError) inanalyzer.pylacks test coverage. (Issue #1661)
Code quality
- Scheduler creates its own DatabaseClient.
scheduler.pybypasses the application-level pooled client, creating a new connection on every tick. Refactor to useget_db_client(). (Issue #1658)
API improvements
- API pagination. The
/analyze/batchand/jobsendpoints could benefit from cursor-based pagination for large result sets. - Request validation improvements. Add stricter input validation for company names (disallow special characters, enforce length limits).
P3 -- Nice to Have
Lower-urgency enhancements and future features.
- Historical analysis diffing. Show what changed between two analysis runs for the same company, highlighting new patents and score shifts.
- Patent classification tagging. Automatically tag patents by technology domain (AI, semiconductors, materials science) using LLM classification.
- User-level API keys. Allow users to generate personal API keys for programmatic access without JWT token refresh.
- Batch export. Export analysis results for multiple companies at once as a ZIP archive.
- Rate limiting dashboard. Surface rate limit status and usage statistics in the admin panel.
- Async webhook delivery. Move webhook delivery to a background task queue (e.g., Celery, arq) to avoid blocking the scheduler.
- Multi-tenant support. Scope analysis results and tracked companies per user or organization.
Infrastructure and Deployment
Kubernetes manifests, Helm charts, and cluster-level concerns (MetalLB, storage, FluxCD sync) are tracked in the Talos repository. File infrastructure-related issues there, not here.