3dac88ec90
- Add docstring to analyze_single_patent explaining the PDF prerequisite - Raise FileNotFoundError with helpful message when PDF is missing - Add patent PDF storage section to README with Docker volume mount example - Commit frontend/package-lock.json for reproducible builds Closes leeworks-agents/SPARC#15 Closes leeworks-agents/SPARC#17 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
302 lines
8.0 KiB
Markdown
302 lines
8.0 KiB
Markdown
# SPARC
|
|
|
|
**Semiconductor Patent & Analytics Report Core**
|
|
|
|
A patent analysis system that estimates company performance by analyzing their patent portfolios using LLM-powered insights.
|
|
|
|
## Overview
|
|
|
|
SPARC automatically collects, parses, and analyzes patents from companies to provide performance estimations. It uses Claude AI to evaluate innovation quality, strategic direction, and competitive positioning based on patent content.
|
|
|
|
## Features
|
|
|
|
- **Patent Retrieval**: Automated collection via SerpAPI's Google Patents engine
|
|
- **Intelligent Parsing**: Extracts key sections (abstract, claims, summary) from patent PDFs
|
|
- **Content Minimization**: Removes verbose descriptions to reduce LLM token usage
|
|
- **AI Analysis**: Uses Claude 3.5 Sonnet via OpenRouter to analyze innovation quality and market potential
|
|
- **Portfolio Analysis**: Evaluates multiple patents holistically for comprehensive insights
|
|
- **Batch Processing**: Analyze multiple companies concurrently with progress tracking
|
|
- **REST API**: FastAPI web service with async job support
|
|
- **Dashboard**: React TypeScript web dashboard with authentication
|
|
- **Robust Testing**: 40 tests covering all major functionality
|
|
|
|
## Architecture
|
|
|
|
```
|
|
SPARC/
|
|
├── serp_api.py # Patent retrieval and PDF parsing
|
|
├── llm.py # Claude AI integration via OpenRouter
|
|
├── analyzer.py # High-level orchestration
|
|
├── api.py # FastAPI web service with auth endpoints
|
|
├── auth.py # JWT authentication module
|
|
├── database.py # PostgreSQL storage with caching
|
|
├── types.py # Data models
|
|
└── config.py # Environment configuration
|
|
```
|
|
|
|
## Installation
|
|
|
|
### Docker (Recommended)
|
|
|
|
```bash
|
|
# Clone and configure
|
|
git clone <repository-url>
|
|
cd SPARC
|
|
cp .env.example .env
|
|
# Edit .env with your API keys
|
|
|
|
# Start all services (API, Dashboard, PostgreSQL)
|
|
docker-compose up -d
|
|
|
|
# Access the services
|
|
# - API: http://localhost:8000
|
|
# - Dashboard: http://localhost:8080
|
|
# - API Docs: http://localhost:8000/docs
|
|
```
|
|
|
|
#### Patent PDF Storage
|
|
|
|
The API stores downloaded patent PDFs in a `patents/` directory. In Docker,
|
|
this is mounted as a bind mount (`./patents:/app/patents`) so that PDFs persist
|
|
across container restarts.
|
|
|
|
If you deploy to a different environment, ensure the `patents/` directory is a
|
|
persistent volume. Without it, PDFs will be re-downloaded on every analysis.
|
|
|
|
```yaml
|
|
# docker-compose.yml excerpt
|
|
volumes:
|
|
- ./patents:/app/patents
|
|
```
|
|
|
|
### NixOS
|
|
|
|
```bash
|
|
nix develop
|
|
```
|
|
|
|
This automatically creates a virtual environment and installs all dependencies.
|
|
|
|
### Manual Installation
|
|
|
|
```bash
|
|
python -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
## Configuration
|
|
|
|
Create a `.env` file in the project root:
|
|
|
|
```bash
|
|
# SerpAPI key for patent search
|
|
API_KEY=your_serpapi_key_here
|
|
|
|
# OpenRouter API key for Claude AI analysis
|
|
OPENROUTER_API_KEY=your_openrouter_key_here
|
|
```
|
|
|
|
Get your API keys:
|
|
- SerpAPI: https://serpapi.com/
|
|
- OpenRouter: https://openrouter.ai/
|
|
|
|
## Usage
|
|
|
|
### Basic Usage
|
|
|
|
```python
|
|
from SPARC.analyzer import CompanyAnalyzer
|
|
|
|
# Initialize the analyzer
|
|
analyzer = CompanyAnalyzer()
|
|
|
|
# Analyze a company's patent portfolio
|
|
analysis = analyzer.analyze_company("nvidia")
|
|
print(analysis)
|
|
```
|
|
|
|
### Run the Example
|
|
|
|
```bash
|
|
python main.py
|
|
```
|
|
|
|
This will:
|
|
1. Retrieve recent NVIDIA patents
|
|
2. Parse and minimize content
|
|
3. Analyze with Claude AI
|
|
4. Print comprehensive performance assessment
|
|
|
|
### Single Patent Analysis
|
|
|
|
```python
|
|
# Analyze a specific patent
|
|
result = analyzer.analyze_single_patent(
|
|
patent_id="US11322171B1",
|
|
company_name="nvidia"
|
|
)
|
|
```
|
|
|
|
### Multi-Company Batch Analysis
|
|
|
|
```python
|
|
from SPARC.analyzer import CompanyAnalyzer
|
|
|
|
analyzer = CompanyAnalyzer()
|
|
|
|
# Analyze multiple companies concurrently (default 3 workers)
|
|
batch_result = analyzer.analyze_companies(
|
|
["nvidia", "amd", "intel", "qualcomm"],
|
|
max_workers=3
|
|
)
|
|
|
|
# Access results
|
|
print(f"Analyzed: {batch_result.total_companies}")
|
|
print(f"Successful: {batch_result.successful}")
|
|
print(f"Failed: {batch_result.failed}")
|
|
|
|
for result in batch_result.results:
|
|
if result.success:
|
|
print(f"{result.company_name}: {result.patent_count} patents")
|
|
print(result.analysis)
|
|
|
|
# Or use sequential processing (safer for rate limits)
|
|
batch_result = analyzer.analyze_companies_sequential(["nvidia", "amd"])
|
|
```
|
|
|
|
### REST API
|
|
|
|
Start the FastAPI server:
|
|
|
|
```bash
|
|
uvicorn SPARC.api:app --reload
|
|
```
|
|
|
|
API endpoints:
|
|
|
|
| Endpoint | Method | Description |
|
|
|----------|--------|-------------|
|
|
| `/health` | GET | Health check |
|
|
| `/analyze/{company}` | GET | Analyze single company |
|
|
| `/analyze/batch` | POST | Analyze multiple companies |
|
|
| `/analyze/batch/async` | POST | Start async batch job |
|
|
| `/jobs/{job_id}` | GET | Get job status |
|
|
| `/jobs` | GET | List all jobs |
|
|
|
|
Interactive docs available at `http://localhost:8000/docs`
|
|
|
|
Example API usage:
|
|
|
|
```bash
|
|
# Single company
|
|
curl http://localhost:8000/analyze/nvidia
|
|
|
|
# Batch analysis
|
|
curl -X POST http://localhost:8000/analyze/batch \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"companies": ["nvidia", "amd", "intel"]}'
|
|
|
|
# Async batch (for long-running jobs)
|
|
curl -X POST http://localhost:8000/analyze/batch/async \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"companies": ["nvidia", "amd", "intel", "qualcomm"]}'
|
|
```
|
|
|
|
### Web Dashboard
|
|
|
|
The React dashboard is included in Docker Compose:
|
|
|
|
```bash
|
|
docker-compose up -d
|
|
```
|
|
|
|
Dashboard features:
|
|
- **Authentication**: User registration, login, and JWT-based sessions
|
|
- **Company Analysis**: Analyze individual companies with real-time results
|
|
- **Batch Analysis**: Process multiple companies with progress tracking
|
|
- **Analytics**: View historical analysis data and trends
|
|
- **Admin Panel**: User management for administrators
|
|
|
|
The dashboard runs at `http://localhost:8080` when using Docker Compose.
|
|
|
|
## Running Tests
|
|
|
|
```bash
|
|
# Run all tests
|
|
pytest tests/ -v
|
|
|
|
# Run specific test modules
|
|
pytest tests/test_analyzer.py -v
|
|
pytest tests/test_llm.py -v
|
|
pytest tests/test_serp_api.py -v
|
|
|
|
# Run with coverage
|
|
pytest tests/ --cov=SPARC --cov-report=term-missing
|
|
```
|
|
|
|
## How It Works
|
|
|
|
1. **Patent Collection**: Queries SerpAPI for company patents
|
|
2. **PDF Download**: Retrieves patent PDF files
|
|
3. **Section Extraction**: Parses abstract, claims, summary, and description
|
|
4. **Content Minimization**: Keeps essential sections, removes bloated descriptions
|
|
5. **LLM Analysis**: Sends minimized content to Claude for analysis
|
|
6. **Performance Estimation**: Returns insights on innovation quality and outlook
|
|
|
|
## Roadmap
|
|
|
|
- [X] Retrieve `publicationID` from SERP API
|
|
- [X] Parse patents from PDFs (no need for Google Patent API)
|
|
- [X] Extract and minimize patent content
|
|
- [X] LLM integration for analysis
|
|
- [X] Company performance estimation
|
|
- [X] Multi-company batch processing
|
|
- [X] FastAPI web service wrapper
|
|
- [X] Docker containerization
|
|
- [X] Results persistence (database)
|
|
- [X] Visualization dashboard
|
|
|
|
## Development
|
|
|
|
### Code Style
|
|
|
|
- Type hints throughout
|
|
- Comprehensive docstrings
|
|
- Small, testable functions
|
|
- Conventional commits
|
|
|
|
### Testing Philosophy
|
|
|
|
- Unit tests for core logic
|
|
- Integration tests for orchestration
|
|
- Mock external APIs
|
|
- Aim for high coverage
|
|
|
|
### Making Changes
|
|
|
|
1. Write tests first
|
|
2. Implement feature
|
|
3. Verify all tests pass
|
|
4. Commit with conventional format: `type: description`
|
|
|
|
Types: `feat`, `fix`, `docs`, `test`, `refactor`, `chore`
|
|
|
|
## Documentation
|
|
|
|
Additional documentation is available in the `docs/` directory:
|
|
|
|
- **[Deployment Guide](docs/DEPLOYMENT.md)** - Complete deployment instructions for Docker, database setup, and production configuration
|
|
- **[Database Mode](docs/DATABASE_MODE.md)** - Database storage for prompts, responses, and analytics
|
|
- **[Container Registry](docs/CONTAINER_REGISTRY.md)** - CI/CD and container registry setup with Gitea Actions
|
|
|
|
## License
|
|
|
|
For open source projects, say how it is licensed.
|
|
|
|
## Project Status
|
|
|
|
Core functionality complete. Ready for production use with API keys configured.
|
|
|
|
All major features implemented: REST API, React dashboard with authentication, Docker containerization, database storage with caching, and multi-company batch processing.
|