Switch docker-compose.yml from bind mount to a named volume (patent_data) so downloaded PDFs survive container recreation. Add a "Patent PDF Storage" section to DEPLOYMENT.md covering Docker Compose, Kubernetes PVC, and S3 alternatives. Closes leeworks-agents/SPARC#1360 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
12 KiB
SPARC Complete Deployment Guide
This guide provides step-by-step instructions for deploying the SPARC (Semiconductor Patent & Analytics Report Core) application with all features enabled, including SERP API patent retrieval, LLM analysis, database storage, and the web UI.
Table of Contents
- Prerequisites
- Step 1: Clone and Configure
- Step 2: Start Services with Docker Compose
- Step 3: Initialize the Database
- Step 4: Run the Services
- Step 5: Verify Deployment
- Step 6: Using the Application
- Step 7: View Stored Data
- Architecture Overview
- Environment Variables Reference
- Production Docker Compose
- Troubleshooting
Prerequisites
- Docker & Docker Compose installed
- API Keys (you'll need to obtain these):
- SerpAPI Key: Sign up at https://serpapi.com/ (free tier: 100 searches/month)
- OpenRouter API Key: Sign up at https://openrouter.ai/ (pay-as-you-go)
Step 1: Clone and Configure
git clone <repository-url>
cd SPARC
# Create environment file
cp .env.example .env
Edit .env with your API keys:
# Required API Keys
API_KEY=your_serpapi_key_here
OPENROUTER_API_KEY=your_openrouter_key_here
# Database Configuration (matches docker-compose.yml)
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/sparc
USE_DATABASE=true
Step 2: Start Services with Docker Compose
# Start all services (PostgreSQL, API, and Dashboard)
docker-compose up -d
# Check status
docker-compose ps
# You should see:
# - sparc-postgres (healthy)
# - sparc-api (running on port 8000)
# - sparc-dashboard (running on port 8080)
The database is automatically initialized by the init-db service.
Step 3: Database Schema
The init-db service automatically creates the llm_messages table with the following schema:
| Column | Type | Purpose |
|---|---|---|
id |
SERIAL | Primary key |
timestamp |
TIMESTAMP | Message creation time |
company_name |
VARCHAR(255) | Company being analyzed |
analysis_type |
VARCHAR(50) | 'single_patent' or 'portfolio' |
model |
VARCHAR(100) | LLM model identifier |
prompt |
TEXT | Full prompt sent to LLM |
response |
TEXT | LLM response |
metadata |
JSONB | Patent IDs, content lengths |
token_usage |
JSONB | prompt/completion/total tokens |
created_at |
TIMESTAMP | Record timestamp |
Step 4: Run the Services
Option A: Run with Docker Compose (Recommended)
All services are started automatically with docker-compose up -d from Step 2.
# View logs
docker-compose logs -f
# View specific service logs
docker-compose logs -f api
docker-compose logs -f dashboard
Option B: Run Locally (Development)
If you prefer running services locally without Docker:
# Start PostgreSQL with Docker
docker-compose up -d postgres
# Wait for database to be healthy, then initialize
python scripts/init_database.py
# Start FastAPI backend
uvicorn SPARC.api:app --host 0.0.0.0 --port 8000 --reload
# For the React frontend (separate terminal)
cd frontend
npm install
npm run dev
Step 5: Verify Deployment
# Check API health
curl http://localhost:8000/health
# Expected response:
# {"status":"healthy","version":"0.1.0","timestamp":"..."}
Access the services:
| Service | URL |
|---|---|
| REST API | http://localhost:8000 |
| API Documentation (Swagger) | http://localhost:8000/docs |
| Dashboard (Web UI) | http://localhost:8080 |
Step 6: Using the Application
Via Dashboard (Web UI)
- Open http://localhost:8080
- Register a new account or login (default admin:
admin/admin) - Navigate to "Analysis" from the sidebar
- Enter a company name (e.g., "Intel")
- Click "Analyze"
This will:
- Query SerpAPI for recent patents
- Download and parse patent PDFs
- Send patent content to Claude for analysis
- Store prompt/response in PostgreSQL (with caching)
- Display results in the dashboard
Via REST API
# Analyze single company
curl http://localhost:8000/analyze/Intel
# Batch analyze multiple companies (synchronous)
curl -X POST http://localhost:8000/analyze/batch \
-H "Content-Type: application/json" \
-d '{"companies": ["Intel", "AMD", "NVIDIA"], "max_workers": 3}'
# Async batch (for large jobs)
curl -X POST http://localhost:8000/analyze/batch/async \
-H "Content-Type: application/json" \
-d '{"companies": ["Intel", "AMD"]}'
# Check job status
curl http://localhost:8000/jobs/{job_id}
# List all jobs
curl http://localhost:8000/jobs
Via Python
from SPARC.analyzer import CompanyAnalyzer
analyzer = CompanyAnalyzer()
result = analyzer.analyze("Intel")
print(result.analysis)
Step 7: View Stored Data
# View analytics (aggregated usage)
python scripts/view_analytics.py
# View stored messages
python scripts/view_messages.py
# Query database directly
docker exec -it sparc-postgres psql -U postgres -d sparc -c \
"SELECT company_name, analysis_type, token_usage FROM llm_messages ORDER BY timestamp DESC LIMIT 10;"
Architecture Overview
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Dashboard │───▶│ FastAPI │───▶│ Analyzer │
│ (8501) │ │ (8000) │ │ │
└──────────────┘ └──────────────┘ └──────┬───────┘
│
┌──────────────────────────┼──────────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ SerpAPI │ │ OpenRouter │ │ PostgreSQL │
│ (Patents) │ │ (Claude) │ │ (Storage) │
└──────────────┘ └──────────────┘ └──────────────┘
Component Responsibilities
| Component | Purpose |
|---|---|
| Dashboard | React TypeScript web UI with authentication |
| FastAPI | REST API with JWT authentication |
| Analyzer | Orchestrates patent retrieval and LLM analysis |
| SerpAPI | Retrieves patent data from Google Patents |
| OpenRouter | Routes requests to Claude for AI analysis |
| PostgreSQL | Stores prompts, responses, users, and cached results |
Environment Variables Reference
| Variable | Required | Default | Description |
|---|---|---|---|
API_KEY |
Yes | - | SerpAPI key for patent search |
OPENROUTER_API_KEY |
Yes | - | OpenRouter API key for Claude access |
DATABASE_URL |
Yes | - | PostgreSQL connection string |
USE_CACHE |
No | true |
Check database for cached responses before API calls |
JWT_SECRET |
Yes | - | Secret key for JWT authentication (change in production!) |
Database URL Format
postgresql://[user]:[password]@[host]:[port]/[database]
Example:
postgresql://postgres:postgres@localhost:5432/sparc
Docker Compose Services
The docker-compose.yml includes all services needed for production:
| Service | Container | Port | Description |
|---|---|---|---|
postgres |
sparc-postgres | 5432 | PostgreSQL database |
init-db |
sparc-init-db | - | One-time database initialization (seeds admin user) |
api |
sparc-api | 8000 | FastAPI REST API with JWT auth (patent PDFs stored in patent_data volume) |
dashboard |
sparc-dashboard | 8080 | React TypeScript web UI |
Common Docker Compose Commands
# Start all services
docker-compose up -d
# Start with rebuild (after code changes)
docker-compose up -d --build
# View logs
docker-compose logs -f
# View specific service logs
docker-compose logs -f api
docker-compose logs -f dashboard
# Stop all services
docker-compose down
# Stop and remove volumes (WARNING: deletes data)
docker-compose down -v
# Restart a specific service
docker-compose restart api
Patent PDF Storage
The SPARC API downloads patent PDFs during analysis and stores them at /app/patents inside the container. These files are used for subsequent single-patent analysis requests and as a local cache to avoid re-downloading. If this directory is not persisted, all downloaded PDFs are lost when the container is recreated.
Docker Compose (default)
The default docker-compose.yml declares a named volume called patent_data that is mounted at /app/patents:
# In the api service:
volumes:
- patent_data:/app/patents
# At the top-level volumes section:
volumes:
patent_data:
This means PDFs survive docker compose down and docker compose up cycles. To remove patent data intentionally, run:
docker compose down -v # WARNING: also removes postgres_data
# or selectively:
docker volume rm sparc_patent_data
If you prefer a bind mount (e.g., for easy host-side access during development), replace the volume with:
volumes:
- ./patents:/app/patents
Kubernetes
For Kubernetes deployments, create a PersistentVolumeClaim and mount it into the API pod:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: sparc-patent-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: sparc-api
spec:
template:
spec:
containers:
- name: api
volumeMounts:
- name: patent-data
mountPath: /app/patents
volumes:
- name: patent-data
persistentVolumeClaim:
claimName: sparc-patent-data
Adjust the storage size based on expected patent volume. Each patent PDF is typically 1-5 MB.
S3 Object Storage (alternative)
For production deployments that need shared or highly durable storage, set STORAGE_BACKEND=s3 in your .env file. This stores patent PDFs in an S3-compatible bucket (AWS S3 or MinIO) instead of the local filesystem, eliminating the need for a persistent volume. See the S3/MinIO section in .env.example for configuration details.
Troubleshooting
Database Connection Issues
# Check if postgres is running
docker-compose ps
# Check postgres logs
docker-compose logs postgres
# Test database connection
docker exec -it sparc-postgres psql -U postgres -d sparc -c "SELECT 1;"
API Key Issues
# Verify environment variables are set
echo $API_KEY
echo $OPENROUTER_API_KEY
# Test SerpAPI directly
curl "https://serpapi.com/search?engine=google_patents&q=Intel&api_key=$API_KEY"
Port Conflicts
If ports 8000, 8501, or 5432 are in use:
# Find what's using the port
lsof -i :8000
# Or change ports in docker-compose.yml
ports:
- "8080:8000" # Use 8080 instead of 8000
Container Issues
# Rebuild containers after code changes
docker-compose build --no-cache
# Remove all containers and start fresh
docker-compose down
docker-compose up -d --build
Viewing Application Logs
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f api
docker-compose logs -f dashboard
Quick Reference
# Docker setup (recommended)
cp .env.example .env
# Edit .env with API keys
docker-compose up -d
# Local development setup
cp .env.example .env
# Edit .env with API keys
docker-compose up -d postgres
python scripts/init_database.py
uvicorn SPARC.api:app --reload &
cd frontend && npm install && npm run dev &
# Check status
curl http://localhost:8000/health
open http://localhost:8080
# View data
python scripts/view_analytics.py
python scripts/view_messages.py