- Document current project state and architecture - Identify P1 priorities: security hardening, error handling, test coverage - Identify P2 priorities: structured logging, configurable LLM, frontend polish, CI tests - Identify P3 priorities: export, comparison, scheduled analysis, notifications - Reference Talos repo for infrastructure/deployment concerns Closes leeworks-agents/SPARC#2 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
5.2 KiB
SPARC Roadmap
Semiconductor Patent & Analytics Report Core -- development priorities.
Current State
SPARC is a patent analysis platform with a working end-to-end pipeline: Python/FastAPI backend, React/TypeScript frontend, PostgreSQL for persistence and caching, Docker Compose for local development, and Gitea Actions CI/CD for image builds. Core features (patent retrieval via SerpAPI, PDF parsing, LLM analysis via OpenRouter/Claude, batch processing, JWT authentication, analytics dashboard) are all implemented and functional.
P1 -- High Priority
These items address correctness, security, and reliability gaps that should be resolved before broader production use.
Security hardening
- Rotate default JWT secret.
auth.pyships a fallbacksparc-secret-key-change-in-productionthat will be used ifJWT_SECRETis unset. Add a startup check that refuses to start with the default secret in non-development environments. - CORS allow-origins are hardcoded.
api.pyonly permitslocalhost:3000andlocalhost:5173. Make the allowed origins configurable via environment variable so the dashboard works when deployed behind a real domain. - Database credentials in docker-compose.yml. The compose file embeds
postgres:postgresin plain text. Reference a.envfile or Docker secrets instead.
Error handling and resilience
get_db_client()inauth.pycreates a newDatabaseClienton every call. This bypasses the connection pool and can exhaust database connections under load. Refactor to share a single pooled client._jobsdict is in-memory only. Job state is lost on API restart. Persist job status in PostgreSQL or Redis so async batch results survive restarts.- No rate limiting on auth endpoints.
/auth/loginand/auth/registerare unprotected against brute-force or abuse. Add rate limiting middleware.
Test coverage for auth and admin
- The existing API tests (
tests/test_api.py) bypass authentication entirely. Add tests that exercise the JWT flow: registration, login, protected-route access, token refresh, and admin-only endpoints.
P2 -- Medium Priority
Improvements to usability, performance, and developer experience.
Backend
- Add structured logging. Replace
print()calls throughoutanalyzer.py,serp_api.py, andllm.pywith Pythonloggingso log levels and formatting are consistent. - Make LLM model configurable.
llm.pyhardcodesanthropic/claude-3.5-sonnet. Accept aMODELenvironment variable to allow switching models without code changes. - SERP cache TTL is hardcoded to 24 hours. Expose
SERP_CACHE_TTL_HOURSas an environment variable inconfig.py. - Patent PDF storage. PDFs are saved to a local
patents/directory. For containerized deployments, consider object storage (S3/MinIO) or at minimum document the volume mount requirement more prominently. analyze_single_patentassumes local file path. The method constructspatents/{patent_id}.pdfand reads from disk, but does not download the PDF first. Either integrate the download step or document the prerequisite.Patent.patent_idtyped asintintypes.pybut used asstreverywhere. Fix the type annotation tostr.
Frontend
- No loading/error states on several pages. The Batch and Analytics pages would benefit from skeleton loaders and user-friendly error messages.
- No dark mode. Tailwind is configured but no dark variant is applied.
- Missing
package-lock.jsonorpnpm-lock.yaml. The frontend has no lockfile committed, leading to non-reproducible builds.
CI/CD
- No test stage in the Gitea Actions workflow.
build.yamlbuilds and pushes images but never runspytest. Add a test job that gates the build. - No linting or type checking. Add
ruff(Python) andtsc --noEmit(TypeScript) to CI.
P3 -- Nice to Have
Lower-urgency enhancements and future features.
- Export analysis reports. Allow users to download analysis results as PDF or CSV from the dashboard.
- Comparison view. Side-by-side comparison of two companies' patent portfolios.
- Scheduled/recurring analysis. Periodically re-analyze tracked companies and alert on significant changes.
- Webhook/notification support. Send alerts (Slack, Discord, email) when batch jobs complete or when a company's innovation score changes significantly.
- Multi-model support. Let users choose between LLM providers per analysis (e.g., GPT-4o, Gemini, Claude) and compare outputs.
- Patent trend charts. Visualize patent filing frequency and technology category distribution over time in the Analytics page.
- API pagination. The
/analyze/batchand/jobsendpoints could benefit from cursor-based pagination for large result sets. - OpenAPI client generation. Auto-generate the TypeScript API client from the FastAPI OpenAPI spec to keep frontend types in sync.
Infrastructure and Deployment
Kubernetes manifests, Helm charts, and cluster-level concerns (MetalLB, storage, FluxCD sync) are tracked in the Talos repository. File infrastructure-related issues there, not here.