This branch is 64 commits behind 0xWheatyz/SPARC:main
agent-company 4696838fb8 ci: add tsc --noEmit TypeScript type checking to CI pipeline
Upgrade lucide-react to v1.7.0 for proper TypeScript declarations and
add a TypeScript type check step to the test workflow. Both ruff (Python)
and tsc --noEmit (TypeScript) now block merging on failure.

Closes leeworks-agents/SPARC#52

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-26 10:05:55 +00:00
2026-03-23 17:45:42 -04:00

SPARC

Semiconductor Patent & Analytics Report Core

A patent analysis system that estimates company performance by analyzing their patent portfolios using LLM-powered insights.

Overview

SPARC automatically collects, parses, and analyzes patents from companies to provide performance estimations. It uses Claude AI to evaluate innovation quality, strategic direction, and competitive positioning based on patent content.

Features

  • Patent Retrieval: Automated collection via SerpAPI's Google Patents engine
  • Intelligent Parsing: Extracts key sections (abstract, claims, summary) from patent PDFs
  • Content Minimization: Removes verbose descriptions to reduce LLM token usage
  • AI Analysis: Uses Claude 3.5 Sonnet via OpenRouter to analyze innovation quality and market potential
  • Portfolio Analysis: Evaluates multiple patents holistically for comprehensive insights
  • Batch Processing: Analyze multiple companies concurrently with progress tracking
  • REST API: FastAPI web service with async job support
  • Dashboard: React TypeScript web dashboard with authentication
  • Robust Testing: 40 tests covering all major functionality

Architecture

SPARC/
├── serp_api.py       # Patent retrieval and PDF parsing
├── llm.py            # Claude AI integration via OpenRouter
├── analyzer.py       # High-level orchestration
├── api.py            # FastAPI web service with auth endpoints
├── auth.py           # JWT authentication module
├── database.py       # PostgreSQL storage with caching
├── types.py          # Data models
└── config.py         # Environment configuration

Installation

# Clone and configure
git clone <repository-url>
cd SPARC
cp .env.example .env
# Edit .env with your API keys

# Start all services (API, Dashboard, PostgreSQL)
docker-compose up -d

# Access the services
# - API: http://localhost:8000
# - Dashboard: http://localhost:8080
# - API Docs: http://localhost:8000/docs

Patent PDF Storage

The API stores downloaded patent PDFs in a patents/ directory. In Docker, this is mounted as a bind mount (./patents:/app/patents) so that PDFs persist across container restarts.

If you deploy to a different environment, ensure the patents/ directory is a persistent volume. Without it, PDFs will be re-downloaded on every analysis.

# docker-compose.yml excerpt
volumes:
  - ./patents:/app/patents

NixOS

nix develop

This automatically creates a virtual environment and installs all dependencies.

Manual Installation

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Configuration

Create a .env file in the project root:

# SerpAPI key for patent search
API_KEY=your_serpapi_key_here

# OpenRouter API key for Claude AI analysis
OPENROUTER_API_KEY=your_openrouter_key_here

Get your API keys:

Usage

Basic Usage

from SPARC.analyzer import CompanyAnalyzer

# Initialize the analyzer
analyzer = CompanyAnalyzer()

# Analyze a company's patent portfolio
analysis = analyzer.analyze_company("nvidia")
print(analysis)

Run the Example

python main.py

This will:

  1. Retrieve recent NVIDIA patents
  2. Parse and minimize content
  3. Analyze with Claude AI
  4. Print comprehensive performance assessment

Single Patent Analysis

# Analyze a specific patent
result = analyzer.analyze_single_patent(
    patent_id="US11322171B1",
    company_name="nvidia"
)

Multi-Company Batch Analysis

from SPARC.analyzer import CompanyAnalyzer

analyzer = CompanyAnalyzer()

# Analyze multiple companies concurrently (default 3 workers)
batch_result = analyzer.analyze_companies(
    ["nvidia", "amd", "intel", "qualcomm"],
    max_workers=3
)

# Access results
print(f"Analyzed: {batch_result.total_companies}")
print(f"Successful: {batch_result.successful}")
print(f"Failed: {batch_result.failed}")

for result in batch_result.results:
    if result.success:
        print(f"{result.company_name}: {result.patent_count} patents")
        print(result.analysis)

# Or use sequential processing (safer for rate limits)
batch_result = analyzer.analyze_companies_sequential(["nvidia", "amd"])

REST API

Start the FastAPI server:

uvicorn SPARC.api:app --reload

API endpoints:

Endpoint Method Description
/health GET Health check
/analyze/{company} GET Analyze single company
/analyze/batch POST Analyze multiple companies
/analyze/batch/async POST Start async batch job
/jobs/{job_id} GET Get job status
/jobs GET List all jobs

Interactive docs available at http://localhost:8000/docs

Example API usage:

# Single company
curl http://localhost:8000/analyze/nvidia

# Batch analysis
curl -X POST http://localhost:8000/analyze/batch \
  -H "Content-Type: application/json" \
  -d '{"companies": ["nvidia", "amd", "intel"]}'

# Async batch (for long-running jobs)
curl -X POST http://localhost:8000/analyze/batch/async \
  -H "Content-Type: application/json" \
  -d '{"companies": ["nvidia", "amd", "intel", "qualcomm"]}'

Web Dashboard

The React dashboard is included in Docker Compose:

docker-compose up -d

Dashboard features:

  • Authentication: User registration, login, and JWT-based sessions
  • Company Analysis: Analyze individual companies with real-time results
  • Batch Analysis: Process multiple companies with progress tracking
  • Analytics: View historical analysis data and trends
  • Admin Panel: User management for administrators

The dashboard runs at http://localhost:8080 when using Docker Compose.

Running Tests

# Run all tests
pytest tests/ -v

# Run specific test modules
pytest tests/test_analyzer.py -v
pytest tests/test_llm.py -v
pytest tests/test_serp_api.py -v

# Run with coverage
pytest tests/ --cov=SPARC --cov-report=term-missing

How It Works

  1. Patent Collection: Queries SerpAPI for company patents
  2. PDF Download: Retrieves patent PDF files
  3. Section Extraction: Parses abstract, claims, summary, and description
  4. Content Minimization: Keeps essential sections, removes bloated descriptions
  5. LLM Analysis: Sends minimized content to Claude for analysis
  6. Performance Estimation: Returns insights on innovation quality and outlook

Roadmap

  • Retrieve publicationID from SERP API
  • Parse patents from PDFs (no need for Google Patent API)
  • Extract and minimize patent content
  • LLM integration for analysis
  • Company performance estimation
  • Multi-company batch processing
  • FastAPI web service wrapper
  • Docker containerization
  • Results persistence (database)
  • Visualization dashboard

Development

Code Style

  • Type hints throughout
  • Comprehensive docstrings
  • Small, testable functions
  • Conventional commits

Testing Philosophy

  • Unit tests for core logic
  • Integration tests for orchestration
  • Mock external APIs
  • Aim for high coverage

Making Changes

  1. Write tests first
  2. Implement feature
  3. Verify all tests pass
  4. Commit with conventional format: type: description

Types: feat, fix, docs, test, refactor, chore

Documentation

Additional documentation is available in the docs/ directory:

  • Deployment Guide - Complete deployment instructions for Docker, database setup, and production configuration
  • Database Mode - Database storage for prompts, responses, and analytics
  • Container Registry - CI/CD and container registry setup with Gitea Actions

License

For open source projects, say how it is licensed.

Project Status

Core functionality complete. Ready for production use with API keys configured.

All major features implemented: REST API, React dashboard with authentication, Docker containerization, database storage with caching, and multi-company batch processing.

S
Description
No description provided
Readme 27 MiB
Languages
Python 74.8%
TypeScript 23.7%
Nix 0.5%
Dockerfile 0.3%
CSS 0.3%
Other 0.4%