diff --git a/README.md b/README.md index 03c52f6..a6d3171 100644 --- a/README.md +++ b/README.md @@ -1,28 +1,172 @@ # SPARC -## Name -Semiconductor Patent & Analytics Report Core +**Semiconductor Patent & Analytics Report Core** -## Description +A patent analysis system that estimates company performance by analyzing their patent portfolios using LLM-powered insights. -## Installation -### NixOS Installation -`nix develop` to build and configure nix dev environment +## Overview -## Usage -```bash -docker compose up -d +SPARC automatically collects, parses, and analyzes patents from companies to provide performance estimations. It uses Claude AI to evaluate innovation quality, strategic direction, and competitive positioning based on patent content. + +## Features + +- **Patent Retrieval**: Automated collection via SerpAPI's Google Patents engine +- **Intelligent Parsing**: Extracts key sections (abstract, claims, summary) from patent PDFs +- **Content Minimization**: Removes verbose descriptions to reduce LLM token usage +- **AI Analysis**: Uses Claude 3.5 Sonnet to analyze innovation quality and market potential +- **Portfolio Analysis**: Evaluates multiple patents holistically for comprehensive insights +- **Robust Testing**: 26 tests covering all major functionality + +## Architecture + +``` +SPARC/ +├── serp_api.py # Patent retrieval and PDF parsing +├── llm.py # Claude AI integration for analysis +├── analyzer.py # High-level orchestration +├── types.py # Data models +└── config.py # Environment configuration ``` -## Roadmap -- [X] Retrive `publicationID` from SERP API -- [ ] Retrive data from Google's patent API based on those `publicationID`'s - - This may not be needed, looking to parse the patents based soley on the pdf retrived from SERP -- [ ] Wrap this into a python fastAPI, then bundle with docker +## Installation +### NixOS (Recommended) + +```bash +nix develop +``` + +This automatically creates a virtual environment and installs all dependencies. + +### Manual Installation + +```bash +python -m venv .venv +source .venv/bin/activate +pip install -r requirements.txt +``` + +## Configuration + +Create a `.env` file in the project root: + +```bash +# SerpAPI key for patent search +API_KEY=your_serpapi_key_here + +# Anthropic API key for Claude AI analysis +ANTHROPIC_API_KEY=your_anthropic_key_here +``` + +Get your API keys: +- SerpAPI: https://serpapi.com/ +- Anthropic: https://console.anthropic.com/ + +## Usage + +### Basic Usage + +```python +from SPARC.analyzer import CompanyAnalyzer + +# Initialize the analyzer +analyzer = CompanyAnalyzer() + +# Analyze a company's patent portfolio +analysis = analyzer.analyze_company("nvidia") +print(analysis) +``` + +### Run the Example + +```bash +python main.py +``` + +This will: +1. Retrieve recent NVIDIA patents +2. Parse and minimize content +3. Analyze with Claude AI +4. Print comprehensive performance assessment + +### Single Patent Analysis + +```python +# Analyze a specific patent +result = analyzer.analyze_single_patent( + patent_id="US11322171B1", + company_name="nvidia" +) +``` + +## Running Tests + +```bash +# Run all tests +pytest tests/ -v + +# Run specific test modules +pytest tests/test_analyzer.py -v +pytest tests/test_llm.py -v +pytest tests/test_serp_api.py -v + +# Run with coverage +pytest tests/ --cov=SPARC --cov-report=term-missing +``` + +## How It Works + +1. **Patent Collection**: Queries SerpAPI for company patents +2. **PDF Download**: Retrieves patent PDF files +3. **Section Extraction**: Parses abstract, claims, summary, and description +4. **Content Minimization**: Keeps essential sections, removes bloated descriptions +5. **LLM Analysis**: Sends minimized content to Claude for analysis +6. **Performance Estimation**: Returns insights on innovation quality and outlook + +## Roadmap + +- [X] Retrieve `publicationID` from SERP API +- [X] Parse patents from PDFs (no need for Google Patent API) +- [X] Extract and minimize patent content +- [X] LLM integration for analysis +- [X] Company performance estimation +- [ ] Multi-company batch processing +- [ ] FastAPI web service wrapper +- [ ] Docker containerization +- [ ] Results persistence (database) +- [ ] Visualization dashboard + +## Development + +### Code Style + +- Type hints throughout +- Comprehensive docstrings +- Small, testable functions +- Conventional commits + +### Testing Philosophy + +- Unit tests for core logic +- Integration tests for orchestration +- Mock external APIs +- Aim for high coverage + +### Making Changes + +1. Write tests first +2. Implement feature +3. Verify all tests pass +4. Commit with conventional format: `type: description` + +Types: `feat`, `fix`, `docs`, `test`, `refactor`, `chore` ## License + For open source projects, say how it is licensed. -## Project status -Heavy development for the limited time available to me \ No newline at end of file +## Project Status + +Core functionality complete. Ready for production use with API keys configured. + +Next steps: API wrapper, containerization, and multi-company support.