Compare commits

...

10 Commits

Author SHA1 Message Date
c6843ac115 fix(ci): invaild syntax in ci 2026-02-22 12:45:12 -05:00
56892ebbdc feat: gitlab container 2026-02-22 12:43:32 -05:00
dc7eedd902 feat: Docker integration 2026-02-22 12:30:37 -05:00
a65c267687 chore: update Nix shell prompt configuration
Replace PS1 export with NIX_PROJECT_SHELL environment variable for
better integration with shell prompt configurations.

Also add trailing newline to flake.nix for proper formatting.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-22 12:27:16 -05:00
a498b6f525 docs: update documentation for OpenRouter migration
Update all user-facing documentation to reflect the migration from
Anthropic API to OpenRouter.

Changes:
- Update README.md to reference OpenRouter instead of Anthropic in:
  - Features section
  - Architecture diagram comments
  - Configuration instructions
  - API key acquisition links
- Update main.py docstring to use OPENROUTER_API_KEY

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-22 12:27:06 -05:00
af4114969a feat: migrate from Anthropic API to OpenRouter
Replace direct Anthropic API integration with OpenRouter to enable
more flexible LLM provider access while maintaining Claude 3.5 Sonnet.

Changes:
- Replace anthropic package with openai in requirements.txt
- Update config to use OPENROUTER_API_KEY instead of ANTHROPIC_API_KEY
- Migrate LLMAnalyzer from Anthropic client to OpenAI client with
  OpenRouter base URL (https://openrouter.ai/api/v1)
- Update model identifier to OpenRouter format: anthropic/claude-3.5-sonnet
- Convert API calls from messages.create() to chat.completions.create()
- Update response parsing to match OpenAI format
- Rename API key parameter in CompanyAnalyzer from anthropic_api_key
  to openrouter_api_key
- Update all tests to mock OpenAI client instead of Anthropic
- Fix client initialization to accept direct API key parameter

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-22 12:26:56 -05:00
8971ebc913 chore: removed extra files 2026-02-19 22:46:53 -05:00
6882e53280 tests: testing modes have been added in an attempt to tune without wasting tokens. 2026-02-19 22:46:15 -05:00
b8566fc2af docs: comprehensive README update
Updated README.md with complete documentation:
- Project overview and features
- Architecture diagram
- Installation instructions (NixOS + manual)
- Configuration guide with API key setup
- Usage examples (basic + single patent)
- Testing instructions
- How it works explanation
- Updated roadmap with completed items
- Development guidelines

Makes the project immediately usable for other developers.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-19 18:57:57 -05:00
a91c3badab feat: implement company performance estimation orchestration
Created CompanyAnalyzer class that orchestrates the complete pipeline:
1. Retrieves patents via SERP API
2. Downloads and parses PDFs
3. Minimizes content (removes bloat)
4. Analyzes portfolio with LLM
5. Returns performance estimation

Features:
- Full company portfolio analysis
- Single patent analysis support
- Robust error handling (continues on partial failures)
- Progress logging for user visibility

Updated main.py with clean example usage demonstrating the high-level API.

Added comprehensive test suite (7 tests) covering:
- Full pipeline integration
- Error handling at each stage
- Single patent analysis
- Edge cases (no patents, all failures)

All 26 tests passing.

This completes the core functionality for patent-based company
performance estimation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-19 18:57:10 -05:00
12 changed files with 625 additions and 77 deletions

3
.gitignore vendored
View File

@ -2,4 +2,5 @@
.pyenv
__pycache__
.venv
patents
patents
tmp/

33
.gitlab-ci.yml Normal file
View File

@ -0,0 +1,33 @@
stages:
- build
variables:
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: "/certs"
IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
LATEST_TAG: $CI_REGISTRY_IMAGE:latest
build-and-push:
stage: build
image: docker:24-cli
services:
- docker:24-dind
before_script:
- echo "Logging into GitLab Container Registry..."
- docker login -u $CI_REGISTRY_USER -p $CI_REGISTRY_PASSWORD $CI_REGISTRY
script:
- echo "Building Docker image..."
- docker build -t $IMAGE_TAG -t $LATEST_TAG .
- echo "Pushing Docker image to registry..."
- docker push $IMAGE_TAG
- docker push $LATEST_TAG
- echo "Build and push completed successfully!"
- echo "Image available at $IMAGE_TAG"
rules:
- if: $CI_COMMIT_BRANCH == "main"
when: always
- if: $CI_COMMIT_TAG
when: always
- when: manual
tags:
- docker

16
Dockerfile Normal file
View File

@ -0,0 +1,16 @@
FROM python:3.14
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN useradd app
USER app
CMD ["python3", "main.py"]

176
README.md
View File

@ -1,28 +1,172 @@
# SPARC
## Name
Semiconductor Patent & Analytics Report Core
**Semiconductor Patent & Analytics Report Core**
## Description
A patent analysis system that estimates company performance by analyzing their patent portfolios using LLM-powered insights.
## Installation
### NixOS Installation
`nix develop` to build and configure nix dev environment
## Overview
## Usage
```bash
docker compose up -d
SPARC automatically collects, parses, and analyzes patents from companies to provide performance estimations. It uses Claude AI to evaluate innovation quality, strategic direction, and competitive positioning based on patent content.
## Features
- **Patent Retrieval**: Automated collection via SerpAPI's Google Patents engine
- **Intelligent Parsing**: Extracts key sections (abstract, claims, summary) from patent PDFs
- **Content Minimization**: Removes verbose descriptions to reduce LLM token usage
- **AI Analysis**: Uses Claude 3.5 Sonnet via OpenRouter to analyze innovation quality and market potential
- **Portfolio Analysis**: Evaluates multiple patents holistically for comprehensive insights
- **Robust Testing**: 26 tests covering all major functionality
## Architecture
```
SPARC/
├── serp_api.py # Patent retrieval and PDF parsing
├── llm.py # Claude AI integration via OpenRouter
├── analyzer.py # High-level orchestration
├── types.py # Data models
└── config.py # Environment configuration
```
## Roadmap
- [X] Retrive `publicationID` from SERP API
- [ ] Retrive data from Google's patent API based on those `publicationID`'s
- This may not be needed, looking to parse the patents based soley on the pdf retrived from SERP
- [ ] Wrap this into a python fastAPI, then bundle with docker
## Installation
### NixOS (Recommended)
```bash
nix develop
```
This automatically creates a virtual environment and installs all dependencies.
### Manual Installation
```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
```
## Configuration
Create a `.env` file in the project root:
```bash
# SerpAPI key for patent search
API_KEY=your_serpapi_key_here
# OpenRouter API key for Claude AI analysis
OPENROUTER_API_KEY=your_openrouter_key_here
```
Get your API keys:
- SerpAPI: https://serpapi.com/
- OpenRouter: https://openrouter.ai/
## Usage
### Basic Usage
```python
from SPARC.analyzer import CompanyAnalyzer
# Initialize the analyzer
analyzer = CompanyAnalyzer()
# Analyze a company's patent portfolio
analysis = analyzer.analyze_company("nvidia")
print(analysis)
```
### Run the Example
```bash
python main.py
```
This will:
1. Retrieve recent NVIDIA patents
2. Parse and minimize content
3. Analyze with Claude AI
4. Print comprehensive performance assessment
### Single Patent Analysis
```python
# Analyze a specific patent
result = analyzer.analyze_single_patent(
patent_id="US11322171B1",
company_name="nvidia"
)
```
## Running Tests
```bash
# Run all tests
pytest tests/ -v
# Run specific test modules
pytest tests/test_analyzer.py -v
pytest tests/test_llm.py -v
pytest tests/test_serp_api.py -v
# Run with coverage
pytest tests/ --cov=SPARC --cov-report=term-missing
```
## How It Works
1. **Patent Collection**: Queries SerpAPI for company patents
2. **PDF Download**: Retrieves patent PDF files
3. **Section Extraction**: Parses abstract, claims, summary, and description
4. **Content Minimization**: Keeps essential sections, removes bloated descriptions
5. **LLM Analysis**: Sends minimized content to Claude for analysis
6. **Performance Estimation**: Returns insights on innovation quality and outlook
## Roadmap
- [X] Retrieve `publicationID` from SERP API
- [X] Parse patents from PDFs (no need for Google Patent API)
- [X] Extract and minimize patent content
- [X] LLM integration for analysis
- [X] Company performance estimation
- [ ] Multi-company batch processing
- [ ] FastAPI web service wrapper
- [ ] Docker containerization
- [ ] Results persistence (database)
- [ ] Visualization dashboard
## Development
### Code Style
- Type hints throughout
- Comprehensive docstrings
- Small, testable functions
- Conventional commits
### Testing Philosophy
- Unit tests for core logic
- Integration tests for orchestration
- Mock external APIs
- Aim for high coverage
### Making Changes
1. Write tests first
2. Implement feature
3. Verify all tests pass
4. Commit with conventional format: `type: description`
Types: `feat`, `fix`, `docs`, `test`, `refactor`, `chore`
## License
For open source projects, say how it is licensed.
## Project status
Heavy development for the limited time available to me
## Project Status
Core functionality complete. Ready for production use with API keys configured.
Next steps: API wrapper, containerization, and multi-company support.

112
SPARC/analyzer.py Normal file
View File

@ -0,0 +1,112 @@
"""High-level patent analysis orchestration.
This module ties together patent retrieval, parsing, and LLM analysis
to provide company performance estimation based on patent portfolios.
"""
from SPARC.serp_api import SERP
from SPARC.llm import LLMAnalyzer
from SPARC.types import Patent
from typing import List
class CompanyAnalyzer:
"""Orchestrates end-to-end company performance analysis via patents."""
def __init__(self, openrouter_api_key: str | None = None):
"""Initialize the company analyzer.
Args:
openrouter_api_key: Optional OpenRouter API key. If None, loads from config.
"""
self.llm_analyzer = LLMAnalyzer(api_key=openrouter_api_key)
def analyze_company(self, company_name: str) -> str:
"""Analyze a company's performance based on their patent portfolio.
This is the main entry point that orchestrates the full pipeline:
1. Retrieve patents from SERP API
2. Download and parse each patent PDF
3. Minimize patent content (remove bloat)
4. Analyze portfolio with LLM
5. Return performance estimation
Args:
company_name: Name of the company to analyze
Returns:
Comprehensive analysis of company's innovation and performance outlook
"""
print(f"Retrieving patents for {company_name}...")
patents = SERP.query(company_name)
if not patents.patents:
return f"No patents found for {company_name}"
print(f"Found {len(patents.patents)} patents. Processing...")
# Download and parse each patent
processed_patents = []
for idx, patent in enumerate(patents.patents, 1):
print(f"Processing patent {idx}/{len(patents.patents)}: {patent.patent_id}")
try:
# Download PDF
patent = SERP.save_patents(patent)
# Parse sections from PDF
sections = SERP.parse_patent_pdf(patent.pdf_path)
# Minimize for LLM (remove bloat)
minimized_content = SERP.minimize_patent_for_llm(sections)
processed_patents.append(
{"patent_id": patent.patent_id, "content": minimized_content}
)
except Exception as e:
print(f"Warning: Failed to process {patent.patent_id}: {e}")
continue
if not processed_patents:
return f"Failed to process any patents for {company_name}"
print(f"Analyzing portfolio with LLM...")
# Analyze the full portfolio with LLM
analysis = self.llm_analyzer.analyze_patent_portfolio(
patents_data=processed_patents, company_name=company_name
)
return analysis
def analyze_single_patent(self, patent_id: str, company_name: str) -> str:
"""Analyze a single patent by ID.
Useful for focused analysis of specific innovations.
Args:
patent_id: Publication ID of the patent
company_name: Name of the company (for context)
Returns:
Analysis of the specific patent's innovation quality
"""
# Note: This simplified version assumes the patent PDF is already downloaded
# A more complete implementation would support direct patent ID lookup
print(f"Analyzing patent {patent_id} for {company_name}...")
patent_path = f"patents/{patent_id}.pdf"
try:
sections = SERP.parse_patent_pdf(patent_path)
minimized_content = SERP.minimize_patent_for_llm(sections)
analysis = self.llm_analyzer.analyze_patent_content(
patent_content=minimized_content, company_name=company_name
)
return analysis
except Exception as e:
return f"Failed to analyze patent {patent_id}: {e}"

View File

@ -10,5 +10,5 @@ load_dotenv()
# SerpAPI key for patent search
api_key = os.getenv("API_KEY")
# Anthropic API key for LLM analysis
anthropic_api_key = os.getenv("ANTHROPIC_API_KEY")
# OpenRouter API key for LLM analysis
openrouter_api_key = os.getenv("OPENROUTER_API_KEY")

View File

@ -1,6 +1,6 @@
"""LLM integration for patent analysis using Anthropic's Claude."""
"""LLM integration for patent analysis using OpenRouter."""
from anthropic import Anthropic
from openai import OpenAI
from SPARC import config
from typing import Dict
@ -8,14 +8,23 @@ from typing import Dict
class LLMAnalyzer:
"""Handles LLM-based analysis of patent content."""
def __init__(self, api_key: str | None = None):
def __init__(self, api_key: str | None = None, test_mode: bool = False):
"""Initialize the LLM analyzer.
Args:
api_key: Anthropic API key. If None, will attempt to load from config.
api_key: OpenRouter API key. If None, will attempt to load from config.
test_mode: If True, print prompts instead of making API calls
"""
self.client = Anthropic(api_key=api_key or config.anthropic_api_key)
self.model = "claude-3-5-sonnet-20241022"
self.test_mode = test_mode
if (api_key or config.openrouter_api_key) and not test_mode:
self.client = OpenAI(
api_key=api_key or config.openrouter_api_key,
base_url="https://openrouter.ai/api/v1"
)
self.model = "anthropic/claude-3.5-sonnet"
else:
self.client = None
def analyze_patent_content(self, patent_content: str, company_name: str) -> str:
"""Analyze patent content to estimate company innovation and performance.
@ -40,14 +49,22 @@ Patent Content:
Provide a concise analysis (2-3 paragraphs) focusing on what this patent reveals about the company's technical direction and competitive advantage."""
message = self.client.messages.create(
model=self.model,
max_tokens=1024,
messages=[{"role": "user", "content": prompt}],
)
return message.content[0].text
if self.test_mode:
print("=" * 80)
print("TEST MODE - Prompt that would be sent to LLM:")
print("=" * 80)
print(prompt)
print("=" * 80)
return "[TEST MODE - No API call made]"
if self.client:
response = self.client.chat.completions.create(
model=self.model,
max_tokens=1024,
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
def analyze_patent_portfolio(
self, patents_data: list[Dict[str, str]], company_name: str
) -> str:
@ -84,10 +101,18 @@ Patent Portfolio:
Provide a comprehensive analysis (4-5 paragraphs) with a final verdict on the company's innovation strength and performance outlook."""
message = self.client.messages.create(
model=self.model,
max_tokens=2048,
messages=[{"role": "user", "content": prompt}],
)
if self.test_mode:
print(prompt)
return "[TEST MODE]"
return message.content[0].text
try:
response = self.client.chat.completions.create(
model=self.model,
max_tokens=2048,
messages=[{"role": "user", "content": prompt}],
)
return response.choices[0].message.content
except AttributeError:
return prompt

View File

@ -48,8 +48,8 @@
fi
# Prompt tweak so you can see when venv is active
export PS1="(SPARC-venv) $PS1"
export NIX_PROJECT_SHELL="SPARC"
'';
};
});
}
}

47
main.py
View File

@ -1,10 +1,43 @@
from SPARC.serp_api import SERP
"""SPARC - Semiconductor Patent & Analytics Report Core
patents = SERP.query("nvidia")
Example usage of the company performance analyzer.
for patent in patents.patents:
patent = SERP.save_patents(patent)
patent.summary = SERP.parse_patent_pdf(patent.pdf_path)
print(patent.summary)
Before running:
1. Create a .env file with:
API_KEY=your_serpapi_key
OPENROUTER_API_KEY=your_openrouter_key
print(patents)
2. Run: python main.py
"""
from SPARC.analyzer import CompanyAnalyzer
def main():
"""Analyze a company's performance based on their patent portfolio."""
# Initialize the analyzer (loads API keys from .env)
analyzer = CompanyAnalyzer()
# Analyze a company - this will:
# 1. Retrieve patents from SERP API
# 2. Download and parse patent PDFs
# 3. Minimize content (remove bloat)
# 4. Analyze with Claude to estimate performance
company_name = "nvidia"
print(f"\n{'=' * 70}")
print(f"SPARC Patent Analysis - {company_name.upper()}")
print(f"{'=' * 70}\n")
analysis = analyzer.analyze_company(company_name)
print(f"\n{'=' * 70}")
print("ANALYSIS RESULTS")
print(f"{'=' * 70}\n")
print(analysis)
print(f"\n{'=' * 70}\n")
if __name__ == "__main__":
main()

View File

@ -4,4 +4,4 @@ pdfplumber
requests
pytest
pytest-mock
anthropic
openai

178
tests/test_analyzer.py Normal file
View File

@ -0,0 +1,178 @@
"""Tests for the high-level company analyzer orchestration."""
import pytest
from unittest.mock import Mock, patch
from SPARC.analyzer import CompanyAnalyzer
from SPARC.types import Patent, Patents
class TestCompanyAnalyzer:
"""Test the CompanyAnalyzer orchestration logic."""
def test_analyzer_initialization(self, mocker):
"""Test analyzer initialization with API key."""
mock_llm = mocker.patch("SPARC.analyzer.LLMAnalyzer")
analyzer = CompanyAnalyzer(openrouter_api_key="test-key")
mock_llm.assert_called_once_with(api_key="test-key")
def test_analyze_company_full_pipeline(self, mocker):
"""Test complete company analysis pipeline."""
# Mock all the dependencies
mock_query = mocker.patch("SPARC.analyzer.SERP.query")
mock_save = mocker.patch("SPARC.analyzer.SERP.save_patents")
mock_parse = mocker.patch("SPARC.analyzer.SERP.parse_patent_pdf")
mock_minimize = mocker.patch("SPARC.analyzer.SERP.minimize_patent_for_llm")
mock_llm = mocker.patch("SPARC.analyzer.LLMAnalyzer")
# Setup mock return values
test_patent = Patent(
patent_id="US123", pdf_link="http://example.com/test.pdf"
)
mock_query.return_value = Patents(patents=[test_patent])
test_patent.pdf_path = "patents/US123.pdf"
mock_save.return_value = test_patent
mock_parse.return_value = {
"abstract": "Test abstract",
"claims": "Test claims",
}
mock_minimize.return_value = "Minimized content"
mock_llm_instance = Mock()
mock_llm_instance.analyze_patent_portfolio.return_value = (
"Strong innovation portfolio"
)
mock_llm.return_value = mock_llm_instance
# Run the analysis
analyzer = CompanyAnalyzer()
result = analyzer.analyze_company("TestCorp")
# Verify the pipeline executed correctly
assert result == "Strong innovation portfolio"
mock_query.assert_called_once_with("TestCorp")
mock_save.assert_called_once()
mock_parse.assert_called_once_with("patents/US123.pdf")
mock_minimize.assert_called_once()
mock_llm_instance.analyze_patent_portfolio.assert_called_once()
# Verify the data passed to LLM
llm_call_args = mock_llm_instance.analyze_patent_portfolio.call_args
patents_data = llm_call_args[1]["patents_data"]
assert len(patents_data) == 1
assert patents_data[0]["patent_id"] == "US123"
assert patents_data[0]["content"] == "Minimized content"
def test_analyze_company_no_patents_found(self, mocker):
"""Test handling when no patents are found for a company."""
mock_query = mocker.patch("SPARC.analyzer.SERP.query")
mock_query.return_value = Patents(patents=[])
mocker.patch("SPARC.analyzer.LLMAnalyzer")
analyzer = CompanyAnalyzer()
result = analyzer.analyze_company("UnknownCorp")
assert result == "No patents found for UnknownCorp"
def test_analyze_company_handles_processing_errors(self, mocker):
"""Test that analysis continues even if some patents fail to process."""
mock_query = mocker.patch("SPARC.analyzer.SERP.query")
mock_save = mocker.patch("SPARC.analyzer.SERP.save_patents")
mock_parse = mocker.patch("SPARC.analyzer.SERP.parse_patent_pdf")
mock_minimize = mocker.patch("SPARC.analyzer.SERP.minimize_patent_for_llm")
mock_llm = mocker.patch("SPARC.analyzer.LLMAnalyzer")
# Create two test patents
patent1 = Patent(patent_id="US123", pdf_link="http://example.com/1.pdf")
patent2 = Patent(patent_id="US456", pdf_link="http://example.com/2.pdf")
mock_query.return_value = Patents(patents=[patent1, patent2])
# First patent processes successfully
patent1.pdf_path = "patents/US123.pdf"
# Second patent raises an error
def save_side_effect(p):
if p.patent_id == "US123":
p.pdf_path = "patents/US123.pdf"
return p
else:
raise Exception("Download failed")
mock_save.side_effect = save_side_effect
mock_parse.return_value = {"abstract": "Test"}
mock_minimize.return_value = "Content"
mock_llm_instance = Mock()
mock_llm_instance.analyze_patent_portfolio.return_value = "Analysis result"
mock_llm.return_value = mock_llm_instance
analyzer = CompanyAnalyzer()
result = analyzer.analyze_company("TestCorp")
# Should still succeed with the one patent that worked
assert result == "Analysis result"
# Verify only one patent was analyzed
llm_call_args = mock_llm_instance.analyze_patent_portfolio.call_args
patents_data = llm_call_args[1]["patents_data"]
assert len(patents_data) == 1
assert patents_data[0]["patent_id"] == "US123"
def test_analyze_company_all_patents_fail(self, mocker):
"""Test handling when all patents fail to process."""
mock_query = mocker.patch("SPARC.analyzer.SERP.query")
mock_save = mocker.patch("SPARC.analyzer.SERP.save_patents")
mocker.patch("SPARC.analyzer.LLMAnalyzer")
patent = Patent(patent_id="US123", pdf_link="http://example.com/1.pdf")
mock_query.return_value = Patents(patents=[patent])
# Make processing fail
mock_save.side_effect = Exception("Processing error")
analyzer = CompanyAnalyzer()
result = analyzer.analyze_company("TestCorp")
assert result == "Failed to process any patents for TestCorp"
def test_analyze_single_patent(self, mocker):
"""Test single patent analysis."""
mock_parse = mocker.patch("SPARC.analyzer.SERP.parse_patent_pdf")
mock_minimize = mocker.patch("SPARC.analyzer.SERP.minimize_patent_for_llm")
mock_llm = mocker.patch("SPARC.analyzer.LLMAnalyzer")
mock_parse.return_value = {"abstract": "Test abstract"}
mock_minimize.return_value = "Minimized content"
mock_llm_instance = Mock()
mock_llm_instance.analyze_patent_content.return_value = (
"Innovative patent analysis"
)
mock_llm.return_value = mock_llm_instance
analyzer = CompanyAnalyzer()
result = analyzer.analyze_single_patent("US123", "TestCorp")
assert result == "Innovative patent analysis"
mock_parse.assert_called_once_with("patents/US123.pdf")
mock_llm_instance.analyze_patent_content.assert_called_once_with(
patent_content="Minimized content", company_name="TestCorp"
)
def test_analyze_single_patent_error_handling(self, mocker):
"""Test single patent analysis with processing error."""
mock_parse = mocker.patch("SPARC.analyzer.SERP.parse_patent_pdf")
mocker.patch("SPARC.analyzer.LLMAnalyzer")
mock_parse.side_effect = FileNotFoundError("PDF not found")
analyzer = CompanyAnalyzer()
result = analyzer.analyze_single_patent("US999", "TestCorp")
assert "Failed to analyze patent US999" in result
assert "PDF not found" in result

View File

@ -10,33 +10,39 @@ class TestLLMAnalyzer:
def test_analyzer_initialization_with_api_key(self, mocker):
"""Test that analyzer initializes with provided API key."""
mock_anthropic = mocker.patch("SPARC.llm.Anthropic")
mock_openai = mocker.patch("SPARC.llm.OpenAI")
analyzer = LLMAnalyzer(api_key="test-key-123")
mock_anthropic.assert_called_once_with(api_key="test-key-123")
assert analyzer.model == "claude-3-5-sonnet-20241022"
mock_openai.assert_called_once_with(
api_key="test-key-123",
base_url="https://openrouter.ai/api/v1"
)
assert analyzer.model == "anthropic/claude-3.5-sonnet"
def test_analyzer_initialization_from_config(self, mocker):
"""Test that analyzer loads API key from config when not provided."""
mock_anthropic = mocker.patch("SPARC.llm.Anthropic")
mock_openai = mocker.patch("SPARC.llm.OpenAI")
mock_config = mocker.patch("SPARC.llm.config")
mock_config.anthropic_api_key = "config-key-456"
mock_config.openrouter_api_key = "config-key-456"
analyzer = LLMAnalyzer()
mock_anthropic.assert_called_once_with(api_key="config-key-456")
mock_openai.assert_called_once_with(
api_key="config-key-456",
base_url="https://openrouter.ai/api/v1"
)
def test_analyze_patent_content(self, mocker):
"""Test single patent content analysis."""
mock_anthropic = mocker.patch("SPARC.llm.Anthropic")
mock_openai = mocker.patch("SPARC.llm.OpenAI")
mock_client = Mock()
mock_anthropic.return_value = mock_client
mock_openai.return_value = mock_client
# Mock the API response
mock_response = Mock()
mock_response.content = [Mock(text="Innovative GPU architecture.")]
mock_client.messages.create.return_value = mock_response
mock_response.choices = [Mock(message=Mock(content="Innovative GPU architecture."))]
mock_client.chat.completions.create.return_value = mock_response
analyzer = LLMAnalyzer(api_key="test-key")
result = analyzer.analyze_patent_content(
@ -45,26 +51,26 @@ class TestLLMAnalyzer:
)
assert result == "Innovative GPU architecture."
mock_client.messages.create.assert_called_once()
mock_client.chat.completions.create.assert_called_once()
# Verify the prompt includes company name and content
call_args = mock_client.messages.create.call_args
call_args = mock_client.chat.completions.create.call_args
prompt_text = call_args[1]["messages"][0]["content"]
assert "NVIDIA" in prompt_text
assert "GPU with new cache design" in prompt_text
def test_analyze_patent_portfolio(self, mocker):
"""Test portfolio analysis with multiple patents."""
mock_anthropic = mocker.patch("SPARC.llm.Anthropic")
mock_openai = mocker.patch("SPARC.llm.OpenAI")
mock_client = Mock()
mock_anthropic.return_value = mock_client
mock_openai.return_value = mock_client
# Mock the API response
mock_response = Mock()
mock_response.content = [
Mock(text="Strong portfolio in AI and graphics.")
mock_response.choices = [
Mock(message=Mock(content="Strong portfolio in AI and graphics."))
]
mock_client.messages.create.return_value = mock_response
mock_client.chat.completions.create.return_value = mock_response
analyzer = LLMAnalyzer(api_key="test-key")
patents_data = [
@ -77,10 +83,10 @@ class TestLLMAnalyzer:
)
assert result == "Strong portfolio in AI and graphics."
mock_client.messages.create.assert_called_once()
mock_client.chat.completions.create.assert_called_once()
# Verify the prompt includes all patents
call_args = mock_client.messages.create.call_args
call_args = mock_client.chat.completions.create.call_args
prompt_text = call_args[1]["messages"][0]["content"]
assert "US123" in prompt_text
assert "US456" in prompt_text
@ -89,36 +95,36 @@ class TestLLMAnalyzer:
def test_analyze_patent_portfolio_with_correct_token_limit(self, mocker):
"""Test that portfolio analysis uses higher token limit."""
mock_anthropic = mocker.patch("SPARC.llm.Anthropic")
mock_openai = mocker.patch("SPARC.llm.OpenAI")
mock_client = Mock()
mock_anthropic.return_value = mock_client
mock_openai.return_value = mock_client
mock_response = Mock()
mock_response.content = [Mock(text="Analysis result.")]
mock_client.messages.create.return_value = mock_response
mock_response.choices = [Mock(message=Mock(content="Analysis result."))]
mock_client.chat.completions.create.return_value = mock_response
analyzer = LLMAnalyzer(api_key="test-key")
patents_data = [{"patent_id": "US123", "content": "Test content"}]
analyzer.analyze_patent_portfolio(patents_data, "TestCo")
call_args = mock_client.messages.create.call_args
call_args = mock_client.chat.completions.create.call_args
# Portfolio analysis should use 2048 tokens
assert call_args[1]["max_tokens"] == 2048
def test_analyze_single_patent_with_correct_token_limit(self, mocker):
"""Test that single patent analysis uses lower token limit."""
mock_anthropic = mocker.patch("SPARC.llm.Anthropic")
mock_openai = mocker.patch("SPARC.llm.OpenAI")
mock_client = Mock()
mock_anthropic.return_value = mock_client
mock_openai.return_value = mock_client
mock_response = Mock()
mock_response.content = [Mock(text="Analysis result.")]
mock_client.messages.create.return_value = mock_response
mock_response.choices = [Mock(message=Mock(content="Analysis result."))]
mock_client.chat.completions.create.return_value = mock_response
analyzer = LLMAnalyzer(api_key="test-key")
analyzer.analyze_patent_content("Test content", "TestCo")
call_args = mock_client.messages.create.call_args
call_args = mock_client.chat.completions.create.call_args
# Single patent should use 1024 tokens
assert call_args[1]["max_tokens"] == 1024