Files

T

agent-company 4cb1a6ed21 Update ROADMAP.md to reflect completed work and add next-horizon items

Move all completed items (security hardening, structured logging, dark mode,
export, webhooks, scheduled analysis, multi-model, trend charts, CI, etc.)
into a new Completed section. Reorganize remaining P1/P2/P3 items to reflect
current priorities. Add new next-horizon items: historical diffing, patent
classification tagging, user API keys, batch export, and multi-tenant support.

Closes leeworks-agents/SPARC#1659

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-20 19:18:22 +00:00

6.3 KiB

Raw Blame History

SPARC Roadmap

Semiconductor Patent & Analytics Report Core -- development priorities.

Current State

SPARC is a patent analysis platform with a working end-to-end pipeline: Python/FastAPI backend, React/TypeScript frontend, PostgreSQL for persistence and caching, Docker Compose for local development, and Gitea Actions CI/CD for image builds and testing. Core features include patent retrieval via SerpAPI, PDF parsing, LLM analysis via OpenRouter (multi-model: Claude, GPT-4o, Gemini, Llama), batch processing, JWT authentication, analytics dashboard with patent trend charts, scheduled recurring analysis with alerting, webhook notifications (Slack/Discord), CSV and PDF export, S3/MinIO storage, side-by-side company comparison, and dark mode.

Completed

Items that have been implemented and merged into main.

Security hardening

~~Rotate default JWT secret.~~ Startup check refuses to start with the default secret in non-development environments.
~~CORS allow-origins are hardcoded.~~ Allowed origins are now configurable via environment variable.
~~Database credentials in docker-compose.yml.~~ Compose references .env for sensitive values.

Error handling and resilience

~~get_db_client() creates a new DatabaseClient on every call.~~ Refactored to a shared pooled singleton initialized at startup.
~~No rate limiting on auth endpoints.~~ Rate limiting middleware added to /auth/login and /auth/register.

Test coverage

~~API tests bypass authentication.~~ JWT auth integration tests added (33 cases covering registration, login, protected routes, token refresh, and admin-only endpoints).
~~No test stage in CI.~~ Gitea Actions workflow now runs pytest and gates the build.
~~No linting or type checking in CI.~~ ruff (Python) and tsc --noEmit (TypeScript) added to CI pipeline.

Backend

~~Add structured logging.~~ Python logging module used throughout.
~~Make LLM model configurable.~~ MODEL environment variable accepted; multi-model support with per-analysis selection (GPT-4o, Gemini, Claude, Llama).
~~SERP cache TTL hardcoded.~~ SERP_CACHE_TTL_HOURS exposed as env var.
~~Patent PDF storage.~~ S3/MinIO object storage backend added alongside local filesystem. Volume mount requirement documented.
~~analyze_single_patent assumes local file.~~ Auto-download from cached metadata link integrated.
~~Patent.patent_id typed as int.~~ Fixed to str.

Frontend

~~No loading/error states.~~ Skeleton loaders and error states added to Batch and Analytics pages.
~~No dark mode.~~ Full dark mode support with theme-aware chart colors.
~~Missing lockfile.~~ package-lock.json committed.

Features (formerly P3)

~~Export analysis reports.~~ CSV and PDF export endpoints implemented.
~~Comparison view.~~ Side-by-side company patent portfolio comparison added.
~~Scheduled/recurring analysis.~~ APScheduler-based periodic re-analysis with configurable interval and change-threshold alerting.
~~Webhook/notification support.~~ Slack, Discord, and generic HTTP POST webhooks with retry logic.
~~Multi-model support.~~ Model picker in Analysis and Batch pages; backend allow-list validation.
~~Patent trend charts.~~ Filing frequency and category distribution visualizations added to Analytics page.
~~OpenAPI client generation.~~ TypeScript API client auto-generated from FastAPI spec with CI freshness check.

P1 -- High Priority

These items address correctness, reliability, and coverage gaps that should be resolved before broader production use.

Resilience

_jobs dict is in-memory only. Job state is lost on API restart. Persist job status in PostgreSQL or Redis so async batch results survive restarts.

Test coverage gaps

Export endpoint tests. The CSV and PDF export endpoints (/export/) lack test coverage. Add tests covering auth, success, 404, and edge cases. (Issue #1655)
Tracked company admin endpoint tests. The /admin/tracked CRUD endpoints and scheduler integration lack test coverage. (Issue #1656)

P2 -- Medium Priority

Improvements to reliability, test coverage, and code quality.

Test coverage

Webhook integration tests. The retry logic, Slack/Discord payload format, and multi-URL dispatch in webhooks.py need test coverage. (Issue #1657)
S3/MinIO storage backend tests. storage.py has local filesystem tests but no unit tests for the S3 backend (read, write, exists, delete, error handling). (Issue #1660)
analyze_single_patent auto-download path tests. The auto-download fallback (cache lookup, PDF download, FileNotFoundError) in analyzer.py lacks test coverage. (Issue #1661)

Code quality

Scheduler creates its own DatabaseClient. scheduler.py bypasses the application-level pooled client, creating a new connection on every tick. Refactor to use get_db_client(). (Issue #1658)

API improvements

API pagination. The /analyze/batch and /jobs endpoints could benefit from cursor-based pagination for large result sets.
Request validation improvements. Add stricter input validation for company names (disallow special characters, enforce length limits).

P3 -- Nice to Have

Lower-urgency enhancements and future features.

Historical analysis diffing. Show what changed between two analysis runs for the same company, highlighting new patents and score shifts.
Patent classification tagging. Automatically tag patents by technology domain (AI, semiconductors, materials science) using LLM classification.
User-level API keys. Allow users to generate personal API keys for programmatic access without JWT token refresh.
Batch export. Export analysis results for multiple companies at once as a ZIP archive.
Rate limiting dashboard. Surface rate limit status and usage statistics in the admin panel.
Async webhook delivery. Move webhook delivery to a background task queue (e.g., Celery, arq) to avoid blocking the scheduler.
Multi-tenant support. Scope analysis results and tracked companies per user or organization.

Infrastructure and Deployment

Kubernetes manifests, Helm charts, and cluster-level concerns (MetalLB, storage, FluxCD sync) are tracked in the Talos repository. File infrastructure-related issues there, not here.

6.3 KiB Raw Blame History

SPARC Roadmap

Current State

Completed

Security hardening

Error handling and resilience

Test coverage

Backend

Frontend

Features (formerly P3)

P1 -- High Priority

Resilience

Test coverage gaps

P2 -- Medium Priority

Test coverage

Code quality

API improvements

P3 -- Nice to Have

Infrastructure and Deployment

6.3 KiB

Raw Blame History