Compare commits
10 Commits
d7cf80f02f
...
c6843ac115
| Author | SHA1 | Date | |
|---|---|---|---|
| c6843ac115 | |||
| 56892ebbdc | |||
| dc7eedd902 | |||
| a65c267687 | |||
| a498b6f525 | |||
| af4114969a | |||
| 8971ebc913 | |||
| 6882e53280 | |||
| b8566fc2af | |||
| a91c3badab |
1
.gitignore
vendored
1
.gitignore
vendored
@ -3,3 +3,4 @@
|
|||||||
__pycache__
|
__pycache__
|
||||||
.venv
|
.venv
|
||||||
patents
|
patents
|
||||||
|
tmp/
|
||||||
|
|||||||
33
.gitlab-ci.yml
Normal file
33
.gitlab-ci.yml
Normal file
@ -0,0 +1,33 @@
|
|||||||
|
stages:
|
||||||
|
- build
|
||||||
|
|
||||||
|
variables:
|
||||||
|
DOCKER_DRIVER: overlay2
|
||||||
|
DOCKER_TLS_CERTDIR: "/certs"
|
||||||
|
IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
|
||||||
|
LATEST_TAG: $CI_REGISTRY_IMAGE:latest
|
||||||
|
|
||||||
|
build-and-push:
|
||||||
|
stage: build
|
||||||
|
image: docker:24-cli
|
||||||
|
services:
|
||||||
|
- docker:24-dind
|
||||||
|
before_script:
|
||||||
|
- echo "Logging into GitLab Container Registry..."
|
||||||
|
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
|
||||||
|
script:
|
||||||
|
- echo "Building Docker image..."
|
||||||
|
- docker build -t $IMAGE_TAG -t $LATEST_TAG .
|
||||||
|
- echo "Pushing Docker image to registry..."
|
||||||
|
- docker push $IMAGE_TAG
|
||||||
|
- docker push $LATEST_TAG
|
||||||
|
- echo "Build and push completed successfully!"
|
||||||
|
- echo "Image available at $IMAGE_TAG"
|
||||||
|
rules:
|
||||||
|
- if: $CI_COMMIT_BRANCH == "main"
|
||||||
|
when: always
|
||||||
|
- if: $CI_COMMIT_TAG
|
||||||
|
when: always
|
||||||
|
- when: manual
|
||||||
|
tags:
|
||||||
|
- docker
|
||||||
16
Dockerfile
Normal file
16
Dockerfile
Normal file
@ -0,0 +1,16 @@
|
|||||||
|
FROM python:3.14
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
|
||||||
|
COPY requirements.txt .
|
||||||
|
|
||||||
|
RUN pip install --no-cache-dir -r requirements.txt
|
||||||
|
|
||||||
|
COPY . .
|
||||||
|
|
||||||
|
RUN useradd app
|
||||||
|
|
||||||
|
USER app
|
||||||
|
|
||||||
|
CMD ["python3", "main.py"]
|
||||||
|
|
||||||
176
README.md
176
README.md
@ -1,28 +1,172 @@
|
|||||||
# SPARC
|
# SPARC
|
||||||
|
|
||||||
## Name
|
**Semiconductor Patent & Analytics Report Core**
|
||||||
Semiconductor Patent & Analytics Report Core
|
|
||||||
|
|
||||||
## Description
|
A patent analysis system that estimates company performance by analyzing their patent portfolios using LLM-powered insights.
|
||||||
|
|
||||||
## Installation
|
## Overview
|
||||||
### NixOS Installation
|
|
||||||
`nix develop` to build and configure nix dev environment
|
|
||||||
|
|
||||||
## Usage
|
SPARC automatically collects, parses, and analyzes patents from companies to provide performance estimations. It uses Claude AI to evaluate innovation quality, strategic direction, and competitive positioning based on patent content.
|
||||||
```bash
|
|
||||||
docker compose up -d
|
## Features
|
||||||
|
|
||||||
|
- **Patent Retrieval**: Automated collection via SerpAPI's Google Patents engine
|
||||||
|
- **Intelligent Parsing**: Extracts key sections (abstract, claims, summary) from patent PDFs
|
||||||
|
- **Content Minimization**: Removes verbose descriptions to reduce LLM token usage
|
||||||
|
- **AI Analysis**: Uses Claude 3.5 Sonnet via OpenRouter to analyze innovation quality and market potential
|
||||||
|
- **Portfolio Analysis**: Evaluates multiple patents holistically for comprehensive insights
|
||||||
|
- **Robust Testing**: 26 tests covering all major functionality
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
SPARC/
|
||||||
|
├── serp_api.py # Patent retrieval and PDF parsing
|
||||||
|
├── llm.py # Claude AI integration via OpenRouter
|
||||||
|
├── analyzer.py # High-level orchestration
|
||||||
|
├── types.py # Data models
|
||||||
|
└── config.py # Environment configuration
|
||||||
```
|
```
|
||||||
|
|
||||||
## Roadmap
|
## Installation
|
||||||
- [X] Retrive `publicationID` from SERP API
|
|
||||||
- [ ] Retrive data from Google's patent API based on those `publicationID`'s
|
|
||||||
- This may not be needed, looking to parse the patents based soley on the pdf retrived from SERP
|
|
||||||
- [ ] Wrap this into a python fastAPI, then bundle with docker
|
|
||||||
|
|
||||||
|
### NixOS (Recommended)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
nix develop
|
||||||
|
```
|
||||||
|
|
||||||
|
This automatically creates a virtual environment and installs all dependencies.
|
||||||
|
|
||||||
|
### Manual Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python -m venv .venv
|
||||||
|
source .venv/bin/activate
|
||||||
|
pip install -r requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Create a `.env` file in the project root:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# SerpAPI key for patent search
|
||||||
|
API_KEY=your_serpapi_key_here
|
||||||
|
|
||||||
|
# OpenRouter API key for Claude AI analysis
|
||||||
|
OPENROUTER_API_KEY=your_openrouter_key_here
|
||||||
|
```
|
||||||
|
|
||||||
|
Get your API keys:
|
||||||
|
- SerpAPI: https://serpapi.com/
|
||||||
|
- OpenRouter: https://openrouter.ai/
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Basic Usage
|
||||||
|
|
||||||
|
```python
|
||||||
|
from SPARC.analyzer import CompanyAnalyzer
|
||||||
|
|
||||||
|
# Initialize the analyzer
|
||||||
|
analyzer = CompanyAnalyzer()
|
||||||
|
|
||||||
|
# Analyze a company's patent portfolio
|
||||||
|
analysis = analyzer.analyze_company("nvidia")
|
||||||
|
print(analysis)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Run the Example
|
||||||
|
|
||||||
|
```bash
|
||||||
|
python main.py
|
||||||
|
```
|
||||||
|
|
||||||
|
This will:
|
||||||
|
1. Retrieve recent NVIDIA patents
|
||||||
|
2. Parse and minimize content
|
||||||
|
3. Analyze with Claude AI
|
||||||
|
4. Print comprehensive performance assessment
|
||||||
|
|
||||||
|
### Single Patent Analysis
|
||||||
|
|
||||||
|
```python
|
||||||
|
# Analyze a specific patent
|
||||||
|
result = analyzer.analyze_single_patent(
|
||||||
|
patent_id="US11322171B1",
|
||||||
|
company_name="nvidia"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Running Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run all tests
|
||||||
|
pytest tests/ -v
|
||||||
|
|
||||||
|
# Run specific test modules
|
||||||
|
pytest tests/test_analyzer.py -v
|
||||||
|
pytest tests/test_llm.py -v
|
||||||
|
pytest tests/test_serp_api.py -v
|
||||||
|
|
||||||
|
# Run with coverage
|
||||||
|
pytest tests/ --cov=SPARC --cov-report=term-missing
|
||||||
|
```
|
||||||
|
|
||||||
|
## How It Works
|
||||||
|
|
||||||
|
1. **Patent Collection**: Queries SerpAPI for company patents
|
||||||
|
2. **PDF Download**: Retrieves patent PDF files
|
||||||
|
3. **Section Extraction**: Parses abstract, claims, summary, and description
|
||||||
|
4. **Content Minimization**: Keeps essential sections, removes bloated descriptions
|
||||||
|
5. **LLM Analysis**: Sends minimized content to Claude for analysis
|
||||||
|
6. **Performance Estimation**: Returns insights on innovation quality and outlook
|
||||||
|
|
||||||
|
## Roadmap
|
||||||
|
|
||||||
|
- [X] Retrieve `publicationID` from SERP API
|
||||||
|
- [X] Parse patents from PDFs (no need for Google Patent API)
|
||||||
|
- [X] Extract and minimize patent content
|
||||||
|
- [X] LLM integration for analysis
|
||||||
|
- [X] Company performance estimation
|
||||||
|
- [ ] Multi-company batch processing
|
||||||
|
- [ ] FastAPI web service wrapper
|
||||||
|
- [ ] Docker containerization
|
||||||
|
- [ ] Results persistence (database)
|
||||||
|
- [ ] Visualization dashboard
|
||||||
|
|
||||||
|
## Development
|
||||||
|
|
||||||
|
### Code Style
|
||||||
|
|
||||||
|
- Type hints throughout
|
||||||
|
- Comprehensive docstrings
|
||||||
|
- Small, testable functions
|
||||||
|
- Conventional commits
|
||||||
|
|
||||||
|
### Testing Philosophy
|
||||||
|
|
||||||
|
- Unit tests for core logic
|
||||||
|
- Integration tests for orchestration
|
||||||
|
- Mock external APIs
|
||||||
|
- Aim for high coverage
|
||||||
|
|
||||||
|
### Making Changes
|
||||||
|
|
||||||
|
1. Write tests first
|
||||||
|
2. Implement feature
|
||||||
|
3. Verify all tests pass
|
||||||
|
4. Commit with conventional format: `type: description`
|
||||||
|
|
||||||
|
Types: `feat`, `fix`, `docs`, `test`, `refactor`, `chore`
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
For open source projects, say how it is licensed.
|
For open source projects, say how it is licensed.
|
||||||
|
|
||||||
## Project status
|
## Project Status
|
||||||
Heavy development for the limited time available to me
|
|
||||||
|
Core functionality complete. Ready for production use with API keys configured.
|
||||||
|
|
||||||
|
Next steps: API wrapper, containerization, and multi-company support.
|
||||||
|
|||||||
112
SPARC/analyzer.py
Normal file
112
SPARC/analyzer.py
Normal file
@ -0,0 +1,112 @@
|
|||||||
|
"""High-level patent analysis orchestration.
|
||||||
|
|
||||||
|
This module ties together patent retrieval, parsing, and LLM analysis
|
||||||
|
to provide company performance estimation based on patent portfolios.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from SPARC.serp_api import SERP
|
||||||
|
from SPARC.llm import LLMAnalyzer
|
||||||
|
from SPARC.types import Patent
|
||||||
|
from typing import List
|
||||||
|
|
||||||
|
|
||||||
|
class CompanyAnalyzer:
|
||||||
|
"""Orchestrates end-to-end company performance analysis via patents."""
|
||||||
|
|
||||||
|
def __init__(self, openrouter_api_key: str | None = None):
|
||||||
|
"""Initialize the company analyzer.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
openrouter_api_key: Optional OpenRouter API key. If None, loads from config.
|
||||||
|
"""
|
||||||
|
self.llm_analyzer = LLMAnalyzer(api_key=openrouter_api_key)
|
||||||
|
|
||||||
|
def analyze_company(self, company_name: str) -> str:
|
||||||
|
"""Analyze a company's performance based on their patent portfolio.
|
||||||
|
|
||||||
|
This is the main entry point that orchestrates the full pipeline:
|
||||||
|
1. Retrieve patents from SERP API
|
||||||
|
2. Download and parse each patent PDF
|
||||||
|
3. Minimize patent content (remove bloat)
|
||||||
|
4. Analyze portfolio with LLM
|
||||||
|
5. Return performance estimation
|
||||||
|
|
||||||
|
Args:
|
||||||
|
company_name: Name of the company to analyze
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Comprehensive analysis of company's innovation and performance outlook
|
||||||
|
"""
|
||||||
|
print(f"Retrieving patents for {company_name}...")
|
||||||
|
patents = SERP.query(company_name)
|
||||||
|
|
||||||
|
if not patents.patents:
|
||||||
|
return f"No patents found for {company_name}"
|
||||||
|
|
||||||
|
print(f"Found {len(patents.patents)} patents. Processing...")
|
||||||
|
|
||||||
|
# Download and parse each patent
|
||||||
|
processed_patents = []
|
||||||
|
for idx, patent in enumerate(patents.patents, 1):
|
||||||
|
print(f"Processing patent {idx}/{len(patents.patents)}: {patent.patent_id}")
|
||||||
|
|
||||||
|
try:
|
||||||
|
# Download PDF
|
||||||
|
patent = SERP.save_patents(patent)
|
||||||
|
|
||||||
|
# Parse sections from PDF
|
||||||
|
sections = SERP.parse_patent_pdf(patent.pdf_path)
|
||||||
|
|
||||||
|
# Minimize for LLM (remove bloat)
|
||||||
|
minimized_content = SERP.minimize_patent_for_llm(sections)
|
||||||
|
|
||||||
|
processed_patents.append(
|
||||||
|
{"patent_id": patent.patent_id, "content": minimized_content}
|
||||||
|
)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Warning: Failed to process {patent.patent_id}: {e}")
|
||||||
|
continue
|
||||||
|
|
||||||
|
if not processed_patents:
|
||||||
|
return f"Failed to process any patents for {company_name}"
|
||||||
|
|
||||||
|
print(f"Analyzing portfolio with LLM...")
|
||||||
|
|
||||||
|
# Analyze the full portfolio with LLM
|
||||||
|
analysis = self.llm_analyzer.analyze_patent_portfolio(
|
||||||
|
patents_data=processed_patents, company_name=company_name
|
||||||
|
)
|
||||||
|
|
||||||
|
return analysis
|
||||||
|
|
||||||
|
def analyze_single_patent(self, patent_id: str, company_name: str) -> str:
|
||||||
|
"""Analyze a single patent by ID.
|
||||||
|
|
||||||
|
Useful for focused analysis of specific innovations.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
patent_id: Publication ID of the patent
|
||||||
|
company_name: Name of the company (for context)
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Analysis of the specific patent's innovation quality
|
||||||
|
"""
|
||||||
|
# Note: This simplified version assumes the patent PDF is already downloaded
|
||||||
|
# A more complete implementation would support direct patent ID lookup
|
||||||
|
print(f"Analyzing patent {patent_id} for {company_name}...")
|
||||||
|
|
||||||
|
patent_path = f"patents/{patent_id}.pdf"
|
||||||
|
|
||||||
|
try:
|
||||||
|
sections = SERP.parse_patent_pdf(patent_path)
|
||||||
|
minimized_content = SERP.minimize_patent_for_llm(sections)
|
||||||
|
|
||||||
|
analysis = self.llm_analyzer.analyze_patent_content(
|
||||||
|
patent_content=minimized_content, company_name=company_name
|
||||||
|
)
|
||||||
|
|
||||||
|
return analysis
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
return f"Failed to analyze patent {patent_id}: {e}"
|
||||||
@ -10,5 +10,5 @@ load_dotenv()
|
|||||||
# SerpAPI key for patent search
|
# SerpAPI key for patent search
|
||||||
api_key = os.getenv("API_KEY")
|
api_key = os.getenv("API_KEY")
|
||||||
|
|
||||||
# Anthropic API key for LLM analysis
|
# OpenRouter API key for LLM analysis
|
||||||
anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")
|
openrouter_api_key = os.getenv("OPENROUTER_API_KEY")
|
||||||
|
|||||||
61
SPARC/llm.py
61
SPARC/llm.py
@ -1,6 +1,6 @@
|
|||||||
"""LLM integration for patent analysis using Anthropic's Claude."""
|
"""LLM integration for patent analysis using OpenRouter."""
|
||||||
|
|
||||||
from anthropic import Anthropic
|
from openai import OpenAI
|
||||||
from SPARC import config
|
from SPARC import config
|
||||||
from typing import Dict
|
from typing import Dict
|
||||||
|
|
||||||
@ -8,14 +8,23 @@ from typing import Dict
|
|||||||
class LLMAnalyzer:
|
class LLMAnalyzer:
|
||||||
"""Handles LLM-based analysis of patent content."""
|
"""Handles LLM-based analysis of patent content."""
|
||||||
|
|
||||||
def __init__(self, api_key: str | None = None):
|
def __init__(self, api_key: str | None = None, test_mode: bool = False):
|
||||||
"""Initialize the LLM analyzer.
|
"""Initialize the LLM analyzer.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
api_key: Anthropic API key. If None, will attempt to load from config.
|
api_key: OpenRouter API key. If None, will attempt to load from config.
|
||||||
|
test_mode: If True, print prompts instead of making API calls
|
||||||
"""
|
"""
|
||||||
self.client = Anthropic(api_key=api_key or config.anthropic_api_key)
|
self.test_mode = test_mode
|
||||||
self.model = "claude-3-5-sonnet-20241022"
|
|
||||||
|
if (api_key or config.openrouter_api_key) and not test_mode:
|
||||||
|
self.client = OpenAI(
|
||||||
|
api_key=api_key or config.openrouter_api_key,
|
||||||
|
base_url="https://openrouter.ai/api/v1"
|
||||||
|
)
|
||||||
|
self.model = "anthropic/claude-3.5-sonnet"
|
||||||
|
else:
|
||||||
|
self.client = None
|
||||||
|
|
||||||
def analyze_patent_content(self, patent_content: str, company_name: str) -> str:
|
def analyze_patent_content(self, patent_content: str, company_name: str) -> str:
|
||||||
"""Analyze patent content to estimate company innovation and performance.
|
"""Analyze patent content to estimate company innovation and performance.
|
||||||
@ -40,13 +49,21 @@ Patent Content:
|
|||||||
|
|
||||||
Provide a concise analysis (2-3 paragraphs) focusing on what this patent reveals about the company's technical direction and competitive advantage."""
|
Provide a concise analysis (2-3 paragraphs) focusing on what this patent reveals about the company's technical direction and competitive advantage."""
|
||||||
|
|
||||||
message = self.client.messages.create(
|
if self.test_mode:
|
||||||
model=self.model,
|
print("=" * 80)
|
||||||
max_tokens=1024,
|
print("TEST MODE - Prompt that would be sent to LLM:")
|
||||||
messages=[{"role": "user", "content": prompt}],
|
print("=" * 80)
|
||||||
)
|
print(prompt)
|
||||||
|
print("=" * 80)
|
||||||
|
return "[TEST MODE - No API call made]"
|
||||||
|
|
||||||
return message.content[0].text
|
if self.client:
|
||||||
|
response = self.client.chat.completions.create(
|
||||||
|
model=self.model,
|
||||||
|
max_tokens=1024,
|
||||||
|
messages=[{"role": "user", "content": prompt}],
|
||||||
|
)
|
||||||
|
return response.choices[0].message.content
|
||||||
|
|
||||||
def analyze_patent_portfolio(
|
def analyze_patent_portfolio(
|
||||||
self, patents_data: list[Dict[str, str]], company_name: str
|
self, patents_data: list[Dict[str, str]], company_name: str
|
||||||
@ -84,10 +101,18 @@ Patent Portfolio:
|
|||||||
|
|
||||||
Provide a comprehensive analysis (4-5 paragraphs) with a final verdict on the company's innovation strength and performance outlook."""
|
Provide a comprehensive analysis (4-5 paragraphs) with a final verdict on the company's innovation strength and performance outlook."""
|
||||||
|
|
||||||
message = self.client.messages.create(
|
if self.test_mode:
|
||||||
model=self.model,
|
print(prompt)
|
||||||
max_tokens=2048,
|
return "[TEST MODE]"
|
||||||
messages=[{"role": "user", "content": prompt}],
|
|
||||||
)
|
try:
|
||||||
|
response = self.client.chat.completions.create(
|
||||||
|
model=self.model,
|
||||||
|
max_tokens=2048,
|
||||||
|
messages=[{"role": "user", "content": prompt}],
|
||||||
|
)
|
||||||
|
|
||||||
|
return response.choices[0].message.content
|
||||||
|
except AttributeError:
|
||||||
|
return prompt
|
||||||
|
|
||||||
return message.content[0].text
|
|
||||||
|
|||||||
@ -48,7 +48,7 @@
|
|||||||
fi
|
fi
|
||||||
|
|
||||||
# Prompt tweak so you can see when venv is active
|
# Prompt tweak so you can see when venv is active
|
||||||
export PS1="(SPARC-venv) $PS1"
|
export NIX_PROJECT_SHELL="SPARC"
|
||||||
'';
|
'';
|
||||||
};
|
};
|
||||||
});
|
});
|
||||||
|
|||||||
47
main.py
47
main.py
@ -1,10 +1,43 @@
|
|||||||
from SPARC.serp_api import SERP
|
"""SPARC - Semiconductor Patent & Analytics Report Core
|
||||||
|
|
||||||
patents = SERP.query("nvidia")
|
Example usage of the company performance analyzer.
|
||||||
|
|
||||||
for patent in patents.patents:
|
Before running:
|
||||||
patent = SERP.save_patents(patent)
|
1. Create a .env file with:
|
||||||
patent.summary = SERP.parse_patent_pdf(patent.pdf_path)
|
API_KEY=your_serpapi_key
|
||||||
print(patent.summary)
|
OPENROUTER_API_KEY=your_openrouter_key
|
||||||
|
|
||||||
print(patents)
|
2. Run: python main.py
|
||||||
|
"""
|
||||||
|
|
||||||
|
from SPARC.analyzer import CompanyAnalyzer
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
"""Analyze a company's performance based on their patent portfolio."""
|
||||||
|
|
||||||
|
# Initialize the analyzer (loads API keys from .env)
|
||||||
|
analyzer = CompanyAnalyzer()
|
||||||
|
|
||||||
|
# Analyze a company - this will:
|
||||||
|
# 1. Retrieve patents from SERP API
|
||||||
|
# 2. Download and parse patent PDFs
|
||||||
|
# 3. Minimize content (remove bloat)
|
||||||
|
# 4. Analyze with Claude to estimate performance
|
||||||
|
company_name = "nvidia"
|
||||||
|
|
||||||
|
print(f"\n{'=' * 70}")
|
||||||
|
print(f"SPARC Patent Analysis - {company_name.upper()}")
|
||||||
|
print(f"{'=' * 70}\n")
|
||||||
|
|
||||||
|
analysis = analyzer.analyze_company(company_name)
|
||||||
|
|
||||||
|
print(f"\n{'=' * 70}")
|
||||||
|
print("ANALYSIS RESULTS")
|
||||||
|
print(f"{'=' * 70}\n")
|
||||||
|
print(analysis)
|
||||||
|
print(f"\n{'=' * 70}\n")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|||||||
@ -4,4 +4,4 @@ pdfplumber
|
|||||||
requests
|
requests
|
||||||
pytest
|
pytest
|
||||||
pytest-mock
|
pytest-mock
|
||||||
anthropic
|
openai
|
||||||
|
|||||||
178
tests/test_analyzer.py
Normal file
178
tests/test_analyzer.py
Normal file
@ -0,0 +1,178 @@
|
|||||||
|
"""Tests for the high-level company analyzer orchestration."""
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from unittest.mock import Mock, patch
|
||||||
|
from SPARC.analyzer import CompanyAnalyzer
|
||||||
|
from SPARC.types import Patent, Patents
|
||||||
|
|
||||||
|
|
||||||
|
class TestCompanyAnalyzer:
|
||||||
|
"""Test the CompanyAnalyzer orchestration logic."""
|
||||||
|
|
||||||
|
def test_analyzer_initialization(self, mocker):
|
||||||
|
"""Test analyzer initialization with API key."""
|
||||||
|
mock_llm = mocker.patch("SPARC.analyzer.LLMAnalyzer")
|
||||||
|
|
||||||
|
analyzer = CompanyAnalyzer(openrouter_api_key="test-key")
|
||||||
|
|
||||||
|
mock_llm.assert_called_once_with(api_key="test-key")
|
||||||
|
|
||||||
|
def test_analyze_company_full_pipeline(self, mocker):
|
||||||
|
"""Test complete company analysis pipeline."""
|
||||||
|
# Mock all the dependencies
|
||||||
|
mock_query = mocker.patch("SPARC.analyzer.SERP.query")
|
||||||
|
mock_save = mocker.patch("SPARC.analyzer.SERP.save_patents")
|
||||||
|
mock_parse = mocker.patch("SPARC.analyzer.SERP.parse_patent_pdf")
|
||||||
|
mock_minimize = mocker.patch("SPARC.analyzer.SERP.minimize_patent_for_llm")
|
||||||
|
mock_llm = mocker.patch("SPARC.analyzer.LLMAnalyzer")
|
||||||
|
|
||||||
|
# Setup mock return values
|
||||||
|
test_patent = Patent(
|
||||||
|
patent_id="US123", pdf_link="http://example.com/test.pdf"
|
||||||
|
)
|
||||||
|
mock_query.return_value = Patents(patents=[test_patent])
|
||||||
|
|
||||||
|
test_patent.pdf_path = "patents/US123.pdf"
|
||||||
|
mock_save.return_value = test_patent
|
||||||
|
|
||||||
|
mock_parse.return_value = {
|
||||||
|
"abstract": "Test abstract",
|
||||||
|
"claims": "Test claims",
|
||||||
|
}
|
||||||
|
|
||||||
|
mock_minimize.return_value = "Minimized content"
|
||||||
|
|
||||||
|
mock_llm_instance = Mock()
|
||||||
|
mock_llm_instance.analyze_patent_portfolio.return_value = (
|
||||||
|
"Strong innovation portfolio"
|
||||||
|
)
|
||||||
|
mock_llm.return_value = mock_llm_instance
|
||||||
|
|
||||||
|
# Run the analysis
|
||||||
|
analyzer = CompanyAnalyzer()
|
||||||
|
result = analyzer.analyze_company("TestCorp")
|
||||||
|
|
||||||
|
# Verify the pipeline executed correctly
|
||||||
|
assert result == "Strong innovation portfolio"
|
||||||
|
mock_query.assert_called_once_with("TestCorp")
|
||||||
|
mock_save.assert_called_once()
|
||||||
|
mock_parse.assert_called_once_with("patents/US123.pdf")
|
||||||
|
mock_minimize.assert_called_once()
|
||||||
|
mock_llm_instance.analyze_patent_portfolio.assert_called_once()
|
||||||
|
|
||||||
|
# Verify the data passed to LLM
|
||||||
|
llm_call_args = mock_llm_instance.analyze_patent_portfolio.call_args
|
||||||
|
patents_data = llm_call_args[1]["patents_data"]
|
||||||
|
assert len(patents_data) == 1
|
||||||
|
assert patents_data[0]["patent_id"] == "US123"
|
||||||
|
assert patents_data[0]["content"] == "Minimized content"
|
||||||
|
|
||||||
|
def test_analyze_company_no_patents_found(self, mocker):
|
||||||
|
"""Test handling when no patents are found for a company."""
|
||||||
|
mock_query = mocker.patch("SPARC.analyzer.SERP.query")
|
||||||
|
mock_query.return_value = Patents(patents=[])
|
||||||
|
mocker.patch("SPARC.analyzer.LLMAnalyzer")
|
||||||
|
|
||||||
|
analyzer = CompanyAnalyzer()
|
||||||
|
result = analyzer.analyze_company("UnknownCorp")
|
||||||
|
|
||||||
|
assert result == "No patents found for UnknownCorp"
|
||||||
|
|
||||||
|
def test_analyze_company_handles_processing_errors(self, mocker):
|
||||||
|
"""Test that analysis continues even if some patents fail to process."""
|
||||||
|
mock_query = mocker.patch("SPARC.analyzer.SERP.query")
|
||||||
|
mock_save = mocker.patch("SPARC.analyzer.SERP.save_patents")
|
||||||
|
mock_parse = mocker.patch("SPARC.analyzer.SERP.parse_patent_pdf")
|
||||||
|
mock_minimize = mocker.patch("SPARC.analyzer.SERP.minimize_patent_for_llm")
|
||||||
|
mock_llm = mocker.patch("SPARC.analyzer.LLMAnalyzer")
|
||||||
|
|
||||||
|
# Create two test patents
|
||||||
|
patent1 = Patent(patent_id="US123", pdf_link="http://example.com/1.pdf")
|
||||||
|
patent2 = Patent(patent_id="US456", pdf_link="http://example.com/2.pdf")
|
||||||
|
mock_query.return_value = Patents(patents=[patent1, patent2])
|
||||||
|
|
||||||
|
# First patent processes successfully
|
||||||
|
patent1.pdf_path = "patents/US123.pdf"
|
||||||
|
|
||||||
|
# Second patent raises an error
|
||||||
|
def save_side_effect(p):
|
||||||
|
if p.patent_id == "US123":
|
||||||
|
p.pdf_path = "patents/US123.pdf"
|
||||||
|
return p
|
||||||
|
else:
|
||||||
|
raise Exception("Download failed")
|
||||||
|
|
||||||
|
mock_save.side_effect = save_side_effect
|
||||||
|
|
||||||
|
mock_parse.return_value = {"abstract": "Test"}
|
||||||
|
mock_minimize.return_value = "Content"
|
||||||
|
|
||||||
|
mock_llm_instance = Mock()
|
||||||
|
mock_llm_instance.analyze_patent_portfolio.return_value = "Analysis result"
|
||||||
|
mock_llm.return_value = mock_llm_instance
|
||||||
|
|
||||||
|
analyzer = CompanyAnalyzer()
|
||||||
|
result = analyzer.analyze_company("TestCorp")
|
||||||
|
|
||||||
|
# Should still succeed with the one patent that worked
|
||||||
|
assert result == "Analysis result"
|
||||||
|
|
||||||
|
# Verify only one patent was analyzed
|
||||||
|
llm_call_args = mock_llm_instance.analyze_patent_portfolio.call_args
|
||||||
|
patents_data = llm_call_args[1]["patents_data"]
|
||||||
|
assert len(patents_data) == 1
|
||||||
|
assert patents_data[0]["patent_id"] == "US123"
|
||||||
|
|
||||||
|
def test_analyze_company_all_patents_fail(self, mocker):
|
||||||
|
"""Test handling when all patents fail to process."""
|
||||||
|
mock_query = mocker.patch("SPARC.analyzer.SERP.query")
|
||||||
|
mock_save = mocker.patch("SPARC.analyzer.SERP.save_patents")
|
||||||
|
mocker.patch("SPARC.analyzer.LLMAnalyzer")
|
||||||
|
|
||||||
|
patent = Patent(patent_id="US123", pdf_link="http://example.com/1.pdf")
|
||||||
|
mock_query.return_value = Patents(patents=[patent])
|
||||||
|
|
||||||
|
# Make processing fail
|
||||||
|
mock_save.side_effect = Exception("Processing error")
|
||||||
|
|
||||||
|
analyzer = CompanyAnalyzer()
|
||||||
|
result = analyzer.analyze_company("TestCorp")
|
||||||
|
|
||||||
|
assert result == "Failed to process any patents for TestCorp"
|
||||||
|
|
||||||
|
def test_analyze_single_patent(self, mocker):
|
||||||
|
"""Test single patent analysis."""
|
||||||
|
mock_parse = mocker.patch("SPARC.analyzer.SERP.parse_patent_pdf")
|
||||||
|
mock_minimize = mocker.patch("SPARC.analyzer.SERP.minimize_patent_for_llm")
|
||||||
|
mock_llm = mocker.patch("SPARC.analyzer.LLMAnalyzer")
|
||||||
|
|
||||||
|
mock_parse.return_value = {"abstract": "Test abstract"}
|
||||||
|
mock_minimize.return_value = "Minimized content"
|
||||||
|
|
||||||
|
mock_llm_instance = Mock()
|
||||||
|
mock_llm_instance.analyze_patent_content.return_value = (
|
||||||
|
"Innovative patent analysis"
|
||||||
|
)
|
||||||
|
mock_llm.return_value = mock_llm_instance
|
||||||
|
|
||||||
|
analyzer = CompanyAnalyzer()
|
||||||
|
result = analyzer.analyze_single_patent("US123", "TestCorp")
|
||||||
|
|
||||||
|
assert result == "Innovative patent analysis"
|
||||||
|
mock_parse.assert_called_once_with("patents/US123.pdf")
|
||||||
|
mock_llm_instance.analyze_patent_content.assert_called_once_with(
|
||||||
|
patent_content="Minimized content", company_name="TestCorp"
|
||||||
|
)
|
||||||
|
|
||||||
|
def test_analyze_single_patent_error_handling(self, mocker):
|
||||||
|
"""Test single patent analysis with processing error."""
|
||||||
|
mock_parse = mocker.patch("SPARC.analyzer.SERP.parse_patent_pdf")
|
||||||
|
mocker.patch("SPARC.analyzer.LLMAnalyzer")
|
||||||
|
|
||||||
|
mock_parse.side_effect = FileNotFoundError("PDF not found")
|
||||||
|
|
||||||
|
analyzer = CompanyAnalyzer()
|
||||||
|
result = analyzer.analyze_single_patent("US999", "TestCorp")
|
||||||
|
|
||||||
|
assert "Failed to analyze patent US999" in result
|
||||||
|
assert "PDF not found" in result
|
||||||
@ -10,33 +10,39 @@ class TestLLMAnalyzer:
|
|||||||
|
|
||||||
def test_analyzer_initialization_with_api_key(self, mocker):
|
def test_analyzer_initialization_with_api_key(self, mocker):
|
||||||
"""Test that analyzer initializes with provided API key."""
|
"""Test that analyzer initializes with provided API key."""
|
||||||
mock_anthropic = mocker.patch("SPARC.llm.Anthropic")
|
mock_openai = mocker.patch("SPARC.llm.OpenAI")
|
||||||
|
|
||||||
analyzer = LLMAnalyzer(api_key="test-key-123")
|
analyzer = LLMAnalyzer(api_key="test-key-123")
|
||||||
|
|
||||||
mock_anthropic.assert_called_once_with(api_key="test-key-123")
|
mock_openai.assert_called_once_with(
|
||||||
assert analyzer.model == "claude-3-5-sonnet-20241022"
|
api_key="test-key-123",
|
||||||
|
base_url="https://openrouter.ai/api/v1"
|
||||||
|
)
|
||||||
|
assert analyzer.model == "anthropic/claude-3.5-sonnet"
|
||||||
|
|
||||||
def test_analyzer_initialization_from_config(self, mocker):
|
def test_analyzer_initialization_from_config(self, mocker):
|
||||||
"""Test that analyzer loads API key from config when not provided."""
|
"""Test that analyzer loads API key from config when not provided."""
|
||||||
mock_anthropic = mocker.patch("SPARC.llm.Anthropic")
|
mock_openai = mocker.patch("SPARC.llm.OpenAI")
|
||||||
mock_config = mocker.patch("SPARC.llm.config")
|
mock_config = mocker.patch("SPARC.llm.config")
|
||||||
mock_config.anthropic_api_key = "config-key-456"
|
mock_config.openrouter_api_key = "config-key-456"
|
||||||
|
|
||||||
analyzer = LLMAnalyzer()
|
analyzer = LLMAnalyzer()
|
||||||
|
|
||||||
mock_anthropic.assert_called_once_with(api_key="config-key-456")
|
mock_openai.assert_called_once_with(
|
||||||
|
api_key="config-key-456",
|
||||||
|
base_url="https://openrouter.ai/api/v1"
|
||||||
|
)
|
||||||
|
|
||||||
def test_analyze_patent_content(self, mocker):
|
def test_analyze_patent_content(self, mocker):
|
||||||
"""Test single patent content analysis."""
|
"""Test single patent content analysis."""
|
||||||
mock_anthropic = mocker.patch("SPARC.llm.Anthropic")
|
mock_openai = mocker.patch("SPARC.llm.OpenAI")
|
||||||
mock_client = Mock()
|
mock_client = Mock()
|
||||||
mock_anthropic.return_value = mock_client
|
mock_openai.return_value = mock_client
|
||||||
|
|
||||||
# Mock the API response
|
# Mock the API response
|
||||||
mock_response = Mock()
|
mock_response = Mock()
|
||||||
mock_response.content = [Mock(text="Innovative GPU architecture.")]
|
mock_response.choices = [Mock(message=Mock(content="Innovative GPU architecture."))]
|
||||||
mock_client.messages.create.return_value = mock_response
|
mock_client.chat.completions.create.return_value = mock_response
|
||||||
|
|
||||||
analyzer = LLMAnalyzer(api_key="test-key")
|
analyzer = LLMAnalyzer(api_key="test-key")
|
||||||
result = analyzer.analyze_patent_content(
|
result = analyzer.analyze_patent_content(
|
||||||
@ -45,26 +51,26 @@ class TestLLMAnalyzer:
|
|||||||
)
|
)
|
||||||
|
|
||||||
assert result == "Innovative GPU architecture."
|
assert result == "Innovative GPU architecture."
|
||||||
mock_client.messages.create.assert_called_once()
|
mock_client.chat.completions.create.assert_called_once()
|
||||||
|
|
||||||
# Verify the prompt includes company name and content
|
# Verify the prompt includes company name and content
|
||||||
call_args = mock_client.messages.create.call_args
|
call_args = mock_client.chat.completions.create.call_args
|
||||||
prompt_text = call_args[1]["messages"][0]["content"]
|
prompt_text = call_args[1]["messages"][0]["content"]
|
||||||
assert "NVIDIA" in prompt_text
|
assert "NVIDIA" in prompt_text
|
||||||
assert "GPU with new cache design" in prompt_text
|
assert "GPU with new cache design" in prompt_text
|
||||||
|
|
||||||
def test_analyze_patent_portfolio(self, mocker):
|
def test_analyze_patent_portfolio(self, mocker):
|
||||||
"""Test portfolio analysis with multiple patents."""
|
"""Test portfolio analysis with multiple patents."""
|
||||||
mock_anthropic = mocker.patch("SPARC.llm.Anthropic")
|
mock_openai = mocker.patch("SPARC.llm.OpenAI")
|
||||||
mock_client = Mock()
|
mock_client = Mock()
|
||||||
mock_anthropic.return_value = mock_client
|
mock_openai.return_value = mock_client
|
||||||
|
|
||||||
# Mock the API response
|
# Mock the API response
|
||||||
mock_response = Mock()
|
mock_response = Mock()
|
||||||
mock_response.content = [
|
mock_response.choices = [
|
||||||
Mock(text="Strong portfolio in AI and graphics.")
|
Mock(message=Mock(content="Strong portfolio in AI and graphics."))
|
||||||
]
|
]
|
||||||
mock_client.messages.create.return_value = mock_response
|
mock_client.chat.completions.create.return_value = mock_response
|
||||||
|
|
||||||
analyzer = LLMAnalyzer(api_key="test-key")
|
analyzer = LLMAnalyzer(api_key="test-key")
|
||||||
patents_data = [
|
patents_data = [
|
||||||
@ -77,10 +83,10 @@ class TestLLMAnalyzer:
|
|||||||
)
|
)
|
||||||
|
|
||||||
assert result == "Strong portfolio in AI and graphics."
|
assert result == "Strong portfolio in AI and graphics."
|
||||||
mock_client.messages.create.assert_called_once()
|
mock_client.chat.completions.create.assert_called_once()
|
||||||
|
|
||||||
# Verify the prompt includes all patents
|
# Verify the prompt includes all patents
|
||||||
call_args = mock_client.messages.create.call_args
|
call_args = mock_client.chat.completions.create.call_args
|
||||||
prompt_text = call_args[1]["messages"][0]["content"]
|
prompt_text = call_args[1]["messages"][0]["content"]
|
||||||
assert "US123" in prompt_text
|
assert "US123" in prompt_text
|
||||||
assert "US456" in prompt_text
|
assert "US456" in prompt_text
|
||||||
@ -89,36 +95,36 @@ class TestLLMAnalyzer:
|
|||||||
|
|
||||||
def test_analyze_patent_portfolio_with_correct_token_limit(self, mocker):
|
def test_analyze_patent_portfolio_with_correct_token_limit(self, mocker):
|
||||||
"""Test that portfolio analysis uses higher token limit."""
|
"""Test that portfolio analysis uses higher token limit."""
|
||||||
mock_anthropic = mocker.patch("SPARC.llm.Anthropic")
|
mock_openai = mocker.patch("SPARC.llm.OpenAI")
|
||||||
mock_client = Mock()
|
mock_client = Mock()
|
||||||
mock_anthropic.return_value = mock_client
|
mock_openai.return_value = mock_client
|
||||||
|
|
||||||
mock_response = Mock()
|
mock_response = Mock()
|
||||||
mock_response.content = [Mock(text="Analysis result.")]
|
mock_response.choices = [Mock(message=Mock(content="Analysis result."))]
|
||||||
mock_client.messages.create.return_value = mock_response
|
mock_client.chat.completions.create.return_value = mock_response
|
||||||
|
|
||||||
analyzer = LLMAnalyzer(api_key="test-key")
|
analyzer = LLMAnalyzer(api_key="test-key")
|
||||||
patents_data = [{"patent_id": "US123", "content": "Test content"}]
|
patents_data = [{"patent_id": "US123", "content": "Test content"}]
|
||||||
|
|
||||||
analyzer.analyze_patent_portfolio(patents_data, "TestCo")
|
analyzer.analyze_patent_portfolio(patents_data, "TestCo")
|
||||||
|
|
||||||
call_args = mock_client.messages.create.call_args
|
call_args = mock_client.chat.completions.create.call_args
|
||||||
# Portfolio analysis should use 2048 tokens
|
# Portfolio analysis should use 2048 tokens
|
||||||
assert call_args[1]["max_tokens"] == 2048
|
assert call_args[1]["max_tokens"] == 2048
|
||||||
|
|
||||||
def test_analyze_single_patent_with_correct_token_limit(self, mocker):
|
def test_analyze_single_patent_with_correct_token_limit(self, mocker):
|
||||||
"""Test that single patent analysis uses lower token limit."""
|
"""Test that single patent analysis uses lower token limit."""
|
||||||
mock_anthropic = mocker.patch("SPARC.llm.Anthropic")
|
mock_openai = mocker.patch("SPARC.llm.OpenAI")
|
||||||
mock_client = Mock()
|
mock_client = Mock()
|
||||||
mock_anthropic.return_value = mock_client
|
mock_openai.return_value = mock_client
|
||||||
|
|
||||||
mock_response = Mock()
|
mock_response = Mock()
|
||||||
mock_response.content = [Mock(text="Analysis result.")]
|
mock_response.choices = [Mock(message=Mock(content="Analysis result."))]
|
||||||
mock_client.messages.create.return_value = mock_response
|
mock_client.chat.completions.create.return_value = mock_response
|
||||||
|
|
||||||
analyzer = LLMAnalyzer(api_key="test-key")
|
analyzer = LLMAnalyzer(api_key="test-key")
|
||||||
analyzer.analyze_patent_content("Test content", "TestCo")
|
analyzer.analyze_patent_content("Test content", "TestCo")
|
||||||
|
|
||||||
call_args = mock_client.messages.create.call_args
|
call_args = mock_client.chat.completions.create.call_args
|
||||||
# Single patent should use 1024 tokens
|
# Single patent should use 1024 tokens
|
||||||
assert call_args[1]["max_tokens"] == 1024
|
assert call_args[1]["max_tokens"] == 1024
|
||||||
|
|||||||
Loading…
Reference in New Issue
Block a user