Move seven completed items from the P1 and P2 sections into the Completed section: in-memory jobs persistence, export endpoint tests, tracked company admin tests, webhook integration tests, S3 storage tests, auto-download path tests, and scheduler DatabaseClient refactor. The P2 section now only lists the two genuinely open items: cursor-based pagination (Issue #1669) and request validation (Issue #1670). Closes leeworks-agents/SPARC#1678 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6.0 KiB
SPARC Roadmap
Semiconductor Patent & Analytics Report Core -- development priorities.
Current State
SPARC is a patent analysis platform with a working end-to-end pipeline: Python/FastAPI backend, React/TypeScript frontend, PostgreSQL for persistence and caching, Docker Compose for local development, and Gitea Actions CI/CD for image builds and testing. Core features include patent retrieval via SerpAPI, PDF parsing, LLM analysis via OpenRouter (multi-model: Claude, GPT-4o, Gemini, Llama), batch processing, JWT authentication, analytics dashboard with patent trend charts, scheduled recurring analysis with alerting, webhook notifications (Slack/Discord), CSV and PDF export, S3/MinIO storage, side-by-side company comparison, and dark mode.
Completed
Items that have been implemented and merged into main.
Security hardening
Rotate default JWT secret.Startup check refuses to start with the default secret in non-development environments.CORS allow-origins are hardcoded.Allowed origins are now configurable via environment variable.Database credentials in docker-compose.yml.Compose references.envfor sensitive values.
Error handling and resilience
Refactored to a shared pooled singleton initialized at startup.get_db_client()creates a newDatabaseClienton every call.No rate limiting on auth endpoints.Rate limiting middleware added to/auth/loginand/auth/register.
Test coverage
API tests bypass authentication.JWT auth integration tests added (33 cases covering registration, login, protected routes, token refresh, and admin-only endpoints).No test stage in CI.Gitea Actions workflow now runspytestand gates the build.No linting or type checking in CI.ruff(Python) andtsc --noEmit(TypeScript) added to CI pipeline.
Backend
Add structured logging.Pythonloggingmodule used throughout.Make LLM model configurable.MODELenvironment variable accepted; multi-model support with per-analysis selection (GPT-4o, Gemini, Claude, Llama).SERP cache TTL hardcoded.SERP_CACHE_TTL_HOURSexposed as env var.Patent PDF storage.S3/MinIO object storage backend added alongside local filesystem. Volume mount requirement documented.Auto-download from cached metadata link integrated.analyze_single_patentassumes local file.Fixed toPatent.patent_idtyped asint.str.
Frontend
No loading/error states.Skeleton loaders and error states added to Batch and Analytics pages.No dark mode.Full dark mode support with theme-aware chart colors.Missing lockfile.package-lock.jsoncommitted.
Features (formerly P3)
Export analysis reports.CSV and PDF export endpoints implemented.Comparison view.Side-by-side company patent portfolio comparison added.Scheduled/recurring analysis.APScheduler-based periodic re-analysis with configurable interval and change-threshold alerting.Webhook/notification support.Slack, Discord, and generic HTTP POST webhooks with retry logic.Multi-model support.Model picker in Analysis and Batch pages; backend allow-list validation.Patent trend charts.Filing frequency and category distribution visualizations added to Analytics page.OpenAPI client generation.TypeScript API client auto-generated from FastAPI spec with CI freshness check.
Resilience
Database-backed job persistence implemented using_jobsdict is in-memory only.db.list_jobs()andmark_stale_jobs_failed(). The in-memory_jobsdict has been removed.
Test coverage (P1/P2)
Export endpoint tests.Tests added for CSV and PDF export endpoints.Tracked company admin endpoint tests.Tests added for/admin/trackedCRUD endpoints and scheduler integration.Webhook integration tests.Tests added for retry logic, Slack/Discord payload format, and multi-URL dispatch.S3/MinIO storage backend tests.Unit tests added for the S3 backend (read, write, exists, delete, error handling).Tests added for the auto-download fallback (cache lookup, PDF download, FileNotFoundError).analyze_single_patentauto-download path tests.
Code quality
Scheduler creates its own DatabaseClient.Refactored to use the application-level pooledget_db_client().
P1 -- High Priority
No outstanding P1 items. All previously listed items have been completed and moved to the Completed section above.
P2 -- Medium Priority
Improvements to the API surface.
API improvements
- API pagination. The
/analyze/batchendpoint needs cursor-based pagination for large result sets. The/jobsendpoint already has cursor pagination. (Issue #1669) - Request validation improvements. Add stricter input validation for company names (disallow special characters, enforce length limits). (Issue #1670)
P3 -- Nice to Have
Lower-urgency enhancements and future features.
- Historical analysis diffing. Show what changed between two analysis runs for the same company, highlighting new patents and score shifts.
- Patent classification tagging. Automatically tag patents by technology domain (AI, semiconductors, materials science) using LLM classification.
- User-level API keys. Allow users to generate personal API keys for programmatic access without JWT token refresh.
- Batch export. Export analysis results for multiple companies at once as a ZIP archive.
- Rate limiting dashboard. Surface rate limit status and usage statistics in the admin panel.
- Async webhook delivery. Move webhook delivery to a background task queue (e.g., Celery, arq) to avoid blocking the scheduler.
- Multi-tenant support. Scope analysis results and tracked companies per user or organization.
Infrastructure and Deployment
Kubernetes manifests, Helm charts, and cluster-level concerns (MetalLB, storage, FluxCD sync) are tracked in the Talos repository. File infrastructure-related issues there, not here.