From 4cb1a6ed218e8511385fac88e539d5646ede778a Mon Sep 17 00:00:00 2001 From: agent-company Date: Mon, 20 Apr 2026 19:18:22 +0000 Subject: [PATCH] Update ROADMAP.md to reflect completed work and add next-horizon items Move all completed items (security hardening, structured logging, dark mode, export, webhooks, scheduled analysis, multi-model, trend charts, CI, etc.) into a new Completed section. Reorganize remaining P1/P2/P3 items to reflect current priorities. Add new next-horizon items: historical diffing, patent classification tagging, user API keys, batch export, and multi-tenant support. Closes leeworks-agents/SPARC#1659 Co-Authored-By: Claude Opus 4.6 (1M context) --- ROADMAP.md | 194 ++++++++++++++++++++++++++++++++--------------------- 1 file changed, 118 insertions(+), 76 deletions(-) diff --git a/ROADMAP.md b/ROADMAP.md index 42b571a..5b177d9 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -7,86 +7,131 @@ Semiconductor Patent & Analytics Report Core -- development priorities. SPARC is a patent analysis platform with a working end-to-end pipeline: Python/FastAPI backend, React/TypeScript frontend, PostgreSQL for persistence and caching, Docker Compose for local development, and Gitea Actions CI/CD for -image builds. Core features (patent retrieval via SerpAPI, PDF parsing, LLM -analysis via OpenRouter/Claude, batch processing, JWT authentication, analytics -dashboard) are all implemented and functional. +image builds and testing. Core features include patent retrieval via SerpAPI, +PDF parsing, LLM analysis via OpenRouter (multi-model: Claude, GPT-4o, Gemini, +Llama), batch processing, JWT authentication, analytics dashboard with patent +trend charts, scheduled recurring analysis with alerting, webhook notifications +(Slack/Discord), CSV and PDF export, S3/MinIO storage, side-by-side company +comparison, and dark mode. + +--- + +## Completed + +Items that have been implemented and merged into main. + +### Security hardening + +- ~~Rotate default JWT secret.~~ Startup check refuses to start with the + default secret in non-development environments. +- ~~CORS allow-origins are hardcoded.~~ Allowed origins are now configurable + via environment variable. +- ~~Database credentials in docker-compose.yml.~~ Compose references `.env` + for sensitive values. + +### Error handling and resilience + +- ~~`get_db_client()` creates a new `DatabaseClient` on every call.~~ Refactored + to a shared pooled singleton initialized at startup. +- ~~No rate limiting on auth endpoints.~~ Rate limiting middleware added to + `/auth/login` and `/auth/register`. + +### Test coverage + +- ~~API tests bypass authentication.~~ JWT auth integration tests added (33 + cases covering registration, login, protected routes, token refresh, and + admin-only endpoints). +- ~~No test stage in CI.~~ Gitea Actions workflow now runs `pytest` and gates + the build. +- ~~No linting or type checking in CI.~~ `ruff` (Python) and `tsc --noEmit` + (TypeScript) added to CI pipeline. + +### Backend + +- ~~Add structured logging.~~ Python `logging` module used throughout. +- ~~Make LLM model configurable.~~ `MODEL` environment variable accepted; + multi-model support with per-analysis selection (GPT-4o, Gemini, Claude, + Llama). +- ~~SERP cache TTL hardcoded.~~ `SERP_CACHE_TTL_HOURS` exposed as env var. +- ~~Patent PDF storage.~~ S3/MinIO object storage backend added alongside + local filesystem. Volume mount requirement documented. +- ~~`analyze_single_patent` assumes local file.~~ Auto-download from cached + metadata link integrated. +- ~~`Patent.patent_id` typed as `int`.~~ Fixed to `str`. + +### Frontend + +- ~~No loading/error states.~~ Skeleton loaders and error states added to + Batch and Analytics pages. +- ~~No dark mode.~~ Full dark mode support with theme-aware chart colors. +- ~~Missing lockfile.~~ `package-lock.json` committed. + +### Features (formerly P3) + +- ~~Export analysis reports.~~ CSV and PDF export endpoints implemented. +- ~~Comparison view.~~ Side-by-side company patent portfolio comparison added. +- ~~Scheduled/recurring analysis.~~ APScheduler-based periodic re-analysis + with configurable interval and change-threshold alerting. +- ~~Webhook/notification support.~~ Slack, Discord, and generic HTTP POST + webhooks with retry logic. +- ~~Multi-model support.~~ Model picker in Analysis and Batch pages; backend + allow-list validation. +- ~~Patent trend charts.~~ Filing frequency and category distribution + visualizations added to Analytics page. +- ~~OpenAPI client generation.~~ TypeScript API client auto-generated from + FastAPI spec with CI freshness check. --- ## P1 -- High Priority -These items address correctness, security, and reliability gaps that should be +These items address correctness, reliability, and coverage gaps that should be resolved before broader production use. -### Security hardening +### Resilience -- **Rotate default JWT secret.** `auth.py` ships a fallback - `sparc-secret-key-change-in-production` that will be used if `JWT_SECRET` is - unset. Add a startup check that refuses to start with the default secret in - non-development environments. -- **CORS allow-origins are hardcoded.** `api.py` only permits - `localhost:3000` and `localhost:5173`. Make the allowed origins configurable - via environment variable so the dashboard works when deployed behind a real - domain. -- **Database credentials in docker-compose.yml.** The compose file embeds - `postgres:postgres` in plain text. Reference a `.env` file or Docker secrets - instead. +- **`_jobs` dict is in-memory only.** Job state is lost on API restart. + Persist job status in PostgreSQL or Redis so async batch results survive + restarts. -### Error handling and resilience +### Test coverage gaps -- **`get_db_client()` in `auth.py` creates a new `DatabaseClient` on every - call.** This bypasses the connection pool and can exhaust database - connections under load. Refactor to share a single pooled client. -- **`_jobs` dict is in-memory only.** Job state is lost on API restart. Persist - job status in PostgreSQL or Redis so async batch results survive restarts. -- **No rate limiting on auth endpoints.** `/auth/login` and `/auth/register` - are unprotected against brute-force or abuse. Add rate limiting middleware. - -### Test coverage for auth and admin - -- The existing API tests (`tests/test_api.py`) bypass authentication entirely. - Add tests that exercise the JWT flow: registration, login, protected-route - access, token refresh, and admin-only endpoints. +- **Export endpoint tests.** The CSV and PDF export endpoints (`/export/`) + lack test coverage. Add tests covering auth, success, 404, and edge cases. + *(Issue #1655)* +- **Tracked company admin endpoint tests.** The `/admin/tracked` CRUD + endpoints and scheduler integration lack test coverage. *(Issue #1656)* --- ## P2 -- Medium Priority -Improvements to usability, performance, and developer experience. +Improvements to reliability, test coverage, and code quality. -### Backend +### Test coverage -- **Add structured logging.** Replace `print()` calls throughout `analyzer.py`, - `serp_api.py`, and `llm.py` with Python `logging` so log levels and - formatting are consistent. -- **Make LLM model configurable.** `llm.py` hardcodes - `anthropic/claude-3.5-sonnet`. Accept a `MODEL` environment variable to allow - switching models without code changes. -- **SERP cache TTL is hardcoded to 24 hours.** Expose `SERP_CACHE_TTL_HOURS` - as an environment variable in `config.py`. -- **Patent PDF storage.** PDFs are saved to a local `patents/` directory. For - containerized deployments, consider object storage (S3/MinIO) or at minimum - document the volume mount requirement more prominently. -- **`analyze_single_patent` assumes local file path.** The method constructs - `patents/{patent_id}.pdf` and reads from disk, but does not download the PDF - first. Either integrate the download step or document the prerequisite. -- **`Patent.patent_id` typed as `int` in `types.py` but used as `str` - everywhere.** Fix the type annotation to `str`. +- **Webhook integration tests.** The retry logic, Slack/Discord payload + format, and multi-URL dispatch in `webhooks.py` need test coverage. + *(Issue #1657)* +- **S3/MinIO storage backend tests.** `storage.py` has local filesystem tests + but no unit tests for the S3 backend (read, write, exists, delete, + error handling). *(Issue #1660)* +- **`analyze_single_patent` auto-download path tests.** The auto-download + fallback (cache lookup, PDF download, FileNotFoundError) in + `analyzer.py` lacks test coverage. *(Issue #1661)* -### Frontend +### Code quality -- **No loading/error states on several pages.** The Batch and Analytics pages - would benefit from skeleton loaders and user-friendly error messages. -- **No dark mode.** Tailwind is configured but no dark variant is applied. -- **Missing `package-lock.json` or `pnpm-lock.yaml`.** The frontend has no - lockfile committed, leading to non-reproducible builds. +- **Scheduler creates its own DatabaseClient.** `scheduler.py` bypasses the + application-level pooled client, creating a new connection on every tick. + Refactor to use `get_db_client()`. *(Issue #1658)* -### CI/CD +### API improvements -- **No test stage in the Gitea Actions workflow.** `build.yaml` builds and - pushes images but never runs `pytest`. Add a test job that gates the build. -- **No linting or type checking.** Add `ruff` (Python) and `tsc --noEmit` - (TypeScript) to CI. +- **API pagination.** The `/analyze/batch` and `/jobs` endpoints could benefit + from cursor-based pagination for large result sets. +- **Request validation improvements.** Add stricter input validation for + company names (disallow special characters, enforce length limits). --- @@ -94,23 +139,20 @@ Improvements to usability, performance, and developer experience. Lower-urgency enhancements and future features. -- **Export analysis reports.** Allow users to download analysis results as PDF - or CSV from the dashboard. -- **Comparison view.** Side-by-side comparison of two companies' patent - portfolios. -- **Scheduled/recurring analysis.** Periodically re-analyze tracked companies - and alert on significant changes. -- **Webhook/notification support.** Send alerts (Slack, Discord, email) when - batch jobs complete or when a company's innovation score changes - significantly. -- **Multi-model support.** Let users choose between LLM providers per analysis - (e.g., GPT-4o, Gemini, Claude) and compare outputs. -- **Patent trend charts.** Visualize patent filing frequency and technology - category distribution over time in the Analytics page. -- **API pagination.** The `/analyze/batch` and `/jobs` endpoints could benefit - from cursor-based pagination for large result sets. -- **OpenAPI client generation.** Auto-generate the TypeScript API client from - the FastAPI OpenAPI spec to keep frontend types in sync. +- **Historical analysis diffing.** Show what changed between two analysis runs + for the same company, highlighting new patents and score shifts. +- **Patent classification tagging.** Automatically tag patents by technology + domain (AI, semiconductors, materials science) using LLM classification. +- **User-level API keys.** Allow users to generate personal API keys for + programmatic access without JWT token refresh. +- **Batch export.** Export analysis results for multiple companies at once as + a ZIP archive. +- **Rate limiting dashboard.** Surface rate limit status and usage statistics + in the admin panel. +- **Async webhook delivery.** Move webhook delivery to a background task queue + (e.g., Celery, arq) to avoid blocking the scheduler. +- **Multi-tenant support.** Scope analysis results and tracked companies per + user or organization. ---