Files
SPARC/ROADMAP.md
agent-company e8cdc089fa chore: add ROADMAP.md for SPARC application development
- Document current project state and architecture
- Identify P1 priorities: security hardening, error handling, test coverage
- Identify P2 priorities: structured logging, configurable LLM, frontend polish, CI tests
- Identify P3 priorities: export, comparison, scheduled analysis, notifications
- Reference Talos repo for infrastructure/deployment concerns

Closes leeworks-agents/SPARC#2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 00:06:56 +00:00

5.2 KiB

SPARC Roadmap

Semiconductor Patent & Analytics Report Core -- development priorities.

Current State

SPARC is a patent analysis platform with a working end-to-end pipeline: Python/FastAPI backend, React/TypeScript frontend, PostgreSQL for persistence and caching, Docker Compose for local development, and Gitea Actions CI/CD for image builds. Core features (patent retrieval via SerpAPI, PDF parsing, LLM analysis via OpenRouter/Claude, batch processing, JWT authentication, analytics dashboard) are all implemented and functional.


P1 -- High Priority

These items address correctness, security, and reliability gaps that should be resolved before broader production use.

Security hardening

  • Rotate default JWT secret. auth.py ships a fallback sparc-secret-key-change-in-production that will be used if JWT_SECRET is unset. Add a startup check that refuses to start with the default secret in non-development environments.
  • CORS allow-origins are hardcoded. api.py only permits localhost:3000 and localhost:5173. Make the allowed origins configurable via environment variable so the dashboard works when deployed behind a real domain.
  • Database credentials in docker-compose.yml. The compose file embeds postgres:postgres in plain text. Reference a .env file or Docker secrets instead.

Error handling and resilience

  • get_db_client() in auth.py creates a new DatabaseClient on every call. This bypasses the connection pool and can exhaust database connections under load. Refactor to share a single pooled client.
  • _jobs dict is in-memory only. Job state is lost on API restart. Persist job status in PostgreSQL or Redis so async batch results survive restarts.
  • No rate limiting on auth endpoints. /auth/login and /auth/register are unprotected against brute-force or abuse. Add rate limiting middleware.

Test coverage for auth and admin

  • The existing API tests (tests/test_api.py) bypass authentication entirely. Add tests that exercise the JWT flow: registration, login, protected-route access, token refresh, and admin-only endpoints.

P2 -- Medium Priority

Improvements to usability, performance, and developer experience.

Backend

  • Add structured logging. Replace print() calls throughout analyzer.py, serp_api.py, and llm.py with Python logging so log levels and formatting are consistent.
  • Make LLM model configurable. llm.py hardcodes anthropic/claude-3.5-sonnet. Accept a MODEL environment variable to allow switching models without code changes.
  • SERP cache TTL is hardcoded to 24 hours. Expose SERP_CACHE_TTL_HOURS as an environment variable in config.py.
  • Patent PDF storage. PDFs are saved to a local patents/ directory. For containerized deployments, consider object storage (S3/MinIO) or at minimum document the volume mount requirement more prominently.
  • analyze_single_patent assumes local file path. The method constructs patents/{patent_id}.pdf and reads from disk, but does not download the PDF first. Either integrate the download step or document the prerequisite.
  • Patent.patent_id typed as int in types.py but used as str everywhere. Fix the type annotation to str.

Frontend

  • No loading/error states on several pages. The Batch and Analytics pages would benefit from skeleton loaders and user-friendly error messages.
  • No dark mode. Tailwind is configured but no dark variant is applied.
  • Missing package-lock.json or pnpm-lock.yaml. The frontend has no lockfile committed, leading to non-reproducible builds.

CI/CD

  • No test stage in the Gitea Actions workflow. build.yaml builds and pushes images but never runs pytest. Add a test job that gates the build.
  • No linting or type checking. Add ruff (Python) and tsc --noEmit (TypeScript) to CI.

P3 -- Nice to Have

Lower-urgency enhancements and future features.

  • Export analysis reports. Allow users to download analysis results as PDF or CSV from the dashboard.
  • Comparison view. Side-by-side comparison of two companies' patent portfolios.
  • Scheduled/recurring analysis. Periodically re-analyze tracked companies and alert on significant changes.
  • Webhook/notification support. Send alerts (Slack, Discord, email) when batch jobs complete or when a company's innovation score changes significantly.
  • Multi-model support. Let users choose between LLM providers per analysis (e.g., GPT-4o, Gemini, Claude) and compare outputs.
  • Patent trend charts. Visualize patent filing frequency and technology category distribution over time in the Analytics page.
  • API pagination. The /analyze/batch and /jobs endpoints could benefit from cursor-based pagination for large result sets.
  • OpenAPI client generation. Auto-generate the TypeScript API client from the FastAPI OpenAPI spec to keep frontend types in sync.

Infrastructure and Deployment

Kubernetes manifests, Helm charts, and cluster-level concerns (MetalLB, storage, FluxCD sync) are tracked in the Talos repository. File infrastructure-related issues there, not here.