Files
SPARC/README.md
T
0xWheatyz 490850d7a6
Build and Push Docker Image / build-and-push (push) Successful in 1h1m27s
docs: reorganize documentation into docs/ directory
- Move CONTAINER_REGISTRY.md and DATABASE_MODE.md to docs/
- Add comprehensive DEPLOYMENT.md with full deployment instructions
- Update README.md with documentation section linking to docs/
- Keep README.md at root for GitHub visibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-12 23:51:32 -04:00

266 lines
7.0 KiB
Markdown

# SPARC
**Semiconductor Patent & Analytics Report Core**
A patent analysis system that estimates company performance by analyzing their patent portfolios using LLM-powered insights.
## Overview
SPARC automatically collects, parses, and analyzes patents from companies to provide performance estimations. It uses Claude AI to evaluate innovation quality, strategic direction, and competitive positioning based on patent content.
## Features
- **Patent Retrieval**: Automated collection via SerpAPI's Google Patents engine
- **Intelligent Parsing**: Extracts key sections (abstract, claims, summary) from patent PDFs
- **Content Minimization**: Removes verbose descriptions to reduce LLM token usage
- **AI Analysis**: Uses Claude 3.5 Sonnet via OpenRouter to analyze innovation quality and market potential
- **Portfolio Analysis**: Evaluates multiple patents holistically for comprehensive insights
- **Batch Processing**: Analyze multiple companies concurrently with progress tracking
- **REST API**: FastAPI web service with async job support
- **Dashboard**: Interactive Streamlit visualization dashboard
- **Robust Testing**: 40 tests covering all major functionality
## Architecture
```
SPARC/
├── serp_api.py # Patent retrieval and PDF parsing
├── llm.py # Claude AI integration via OpenRouter
├── analyzer.py # High-level orchestration
├── api.py # FastAPI web service
├── types.py # Data models
└── config.py # Environment configuration
```
## Installation
### NixOS (Recommended)
```bash
nix develop
```
This automatically creates a virtual environment and installs all dependencies.
### Manual Installation
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
## Configuration
Create a `.env` file in the project root:
```bash
# SerpAPI key for patent search
API_KEY=your_serpapi_key_here
# OpenRouter API key for Claude AI analysis
OPENROUTER_API_KEY=your_openrouter_key_here
```
Get your API keys:
- SerpAPI: https://serpapi.com/
- OpenRouter: https://openrouter.ai/
## Usage
### Basic Usage
```python
from SPARC.analyzer import CompanyAnalyzer
# Initialize the analyzer
analyzer = CompanyAnalyzer()
# Analyze a company's patent portfolio
analysis = analyzer.analyze_company("nvidia")
print(analysis)
```
### Run the Example
```bash
python main.py
```
This will:
1. Retrieve recent NVIDIA patents
2. Parse and minimize content
3. Analyze with Claude AI
4. Print comprehensive performance assessment
### Single Patent Analysis
```python
# Analyze a specific patent
result = analyzer.analyze_single_patent(
patent_id="US11322171B1",
company_name="nvidia"
)
```
### Multi-Company Batch Analysis
```python
from SPARC.analyzer import CompanyAnalyzer
analyzer = CompanyAnalyzer()
# Analyze multiple companies concurrently (default 3 workers)
batch_result = analyzer.analyze_companies(
["nvidia", "amd", "intel", "qualcomm"],
max_workers=3
)
# Access results
print(f"Analyzed: {batch_result.total_companies}")
print(f"Successful: {batch_result.successful}")
print(f"Failed: {batch_result.failed}")
for result in batch_result.results:
if result.success:
print(f"{result.company_name}: {result.patent_count} patents")
print(result.analysis)
# Or use sequential processing (safer for rate limits)
batch_result = analyzer.analyze_companies_sequential(["nvidia", "amd"])
```
### REST API
Start the FastAPI server:
```bash
uvicorn SPARC.api:app --reload
```
API endpoints:
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/analyze/{company}` | GET | Analyze single company |
| `/analyze/batch` | POST | Analyze multiple companies |
| `/analyze/batch/async` | POST | Start async batch job |
| `/jobs/{job_id}` | GET | Get job status |
| `/jobs` | GET | List all jobs |
Interactive docs available at `http://localhost:8000/docs`
Example API usage:
```bash
# Single company
curl http://localhost:8000/analyze/nvidia
# Batch analysis
curl -X POST http://localhost:8000/analyze/batch \
-H "Content-Type: application/json" \
-d '{"companies": ["nvidia", "amd", "intel"]}'
# Async batch (for long-running jobs)
curl -X POST http://localhost:8000/analyze/batch/async \
-H "Content-Type: application/json" \
-d '{"companies": ["nvidia", "amd", "intel", "qualcomm"]}'
```
### Visualization Dashboard
Launch the interactive Streamlit dashboard:
```bash
streamlit run dashboard.py
```
Dashboard features:
- **Company Analysis**: Analyze individual companies with real-time results
- **Batch Analysis**: Process multiple companies with progress tracking and charts
- **Analytics**: View historical analysis data and trends (requires database mode)
- **System Status**: Monitor database and analyzer health
The dashboard runs at `http://localhost:8501` by default.
## Running Tests
```bash
# Run all tests
pytest tests/ -v
# Run specific test modules
pytest tests/test_analyzer.py -v
pytest tests/test_llm.py -v
pytest tests/test_serp_api.py -v
# Run with coverage
pytest tests/ --cov=SPARC --cov-report=term-missing
```
## How It Works
1. **Patent Collection**: Queries SerpAPI for company patents
2. **PDF Download**: Retrieves patent PDF files
3. **Section Extraction**: Parses abstract, claims, summary, and description
4. **Content Minimization**: Keeps essential sections, removes bloated descriptions
5. **LLM Analysis**: Sends minimized content to Claude for analysis
6. **Performance Estimation**: Returns insights on innovation quality and outlook
## Roadmap
- [X] Retrieve `publicationID` from SERP API
- [X] Parse patents from PDFs (no need for Google Patent API)
- [X] Extract and minimize patent content
- [X] LLM integration for analysis
- [X] Company performance estimation
- [X] Multi-company batch processing
- [X] FastAPI web service wrapper
- [X] Docker containerization
- [X] Results persistence (database)
- [X] Visualization dashboard
## Development
### Code Style
- Type hints throughout
- Comprehensive docstrings
- Small, testable functions
- Conventional commits
### Testing Philosophy
- Unit tests for core logic
- Integration tests for orchestration
- Mock external APIs
- Aim for high coverage
### Making Changes
1. Write tests first
2. Implement feature
3. Verify all tests pass
4. Commit with conventional format: `type: description`
Types: `feat`, `fix`, `docs`, `test`, `refactor`, `chore`
## Documentation
Additional documentation is available in the `docs/` directory:
- **[Deployment Guide](docs/DEPLOYMENT.md)** - Complete deployment instructions for Docker, database setup, and production configuration
- **[Database Mode](docs/DATABASE_MODE.md)** - Database storage for prompts, responses, and analytics
- **[Container Registry](docs/CONTAINER_REGISTRY.md)** - CI/CD and container registry setup with Gitea Actions
## License
For open source projects, say how it is licensed.
## Project Status
Core functionality complete. Ready for production use with API keys configured.
Next steps: API wrapper, containerization, and multi-company support.