Files

T

agent-company 97048917f2 docs: document patent PDF volume mount for containerized deployments

Switch docker-compose.yml from bind mount to a named volume (patent_data)
so downloaded PDFs survive container recreation. Add a "Patent PDF Storage"
section to DEPLOYMENT.md covering Docker Compose, Kubernetes PVC, and S3
alternatives.

Closes leeworks-agents/SPARC#1360

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-30 16:08:02 +00:00

12 KiB

Raw Permalink Blame History

SPARC Complete Deployment Guide

This guide provides step-by-step instructions for deploying the SPARC (Semiconductor Patent & Analytics Report Core) application with all features enabled, including SERP API patent retrieval, LLM analysis, database storage, and the web UI.

Prerequisites
Step 1: Clone and Configure
Step 2: Start Services with Docker Compose
Step 3: Initialize the Database
Step 4: Run the Services
Step 5: Verify Deployment
Step 6: Using the Application
Step 7: View Stored Data
Architecture Overview
Environment Variables Reference
Production Docker Compose
Troubleshooting

Prerequisites

Docker & Docker Compose installed
API Keys (you'll need to obtain these):
- SerpAPI Key: Sign up at https://serpapi.com/ (free tier: 100 searches/month)
- OpenRouter API Key: Sign up at https://openrouter.ai/ (pay-as-you-go)

Step 1: Clone and Configure

git clone <repository-url>
cd SPARC

# Create environment file
cp .env.example .env

Edit .env with your API keys:

# Required API Keys
API_KEY=your_serpapi_key_here
OPENROUTER_API_KEY=your_openrouter_key_here

# Database Configuration (matches docker-compose.yml)
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/sparc
USE_DATABASE=true

Step 2: Start Services with Docker Compose

# Start all services (PostgreSQL, API, and Dashboard)
docker-compose up -d

# Check status
docker-compose ps

# You should see:
# - sparc-postgres (healthy)
# - sparc-api (running on port 8000)
# - sparc-dashboard (running on port 8080)

The database is automatically initialized by the init-db service.

Step 3: Database Schema

The init-db service automatically creates the llm_messages table with the following schema:

Column	Type	Purpose
`id`	SERIAL	Primary key
`timestamp`	TIMESTAMP	Message creation time
`company_name`	VARCHAR(255)	Company being analyzed
`analysis_type`	VARCHAR(50)	'single_patent' or 'portfolio'
`model`	VARCHAR(100)	LLM model identifier
`prompt`	TEXT	Full prompt sent to LLM
`response`	TEXT	LLM response
`metadata`	JSONB	Patent IDs, content lengths
`token_usage`	JSONB	prompt/completion/total tokens
`created_at`	TIMESTAMP	Record timestamp

Step 4: Run the Services

Option A: Run with Docker Compose (Recommended)

All services are started automatically with docker-compose up -d from Step 2.

# View logs
docker-compose logs -f

# View specific service logs
docker-compose logs -f api
docker-compose logs -f dashboard

Option B: Run Locally (Development)

If you prefer running services locally without Docker:

# Start PostgreSQL with Docker
docker-compose up -d postgres

# Wait for database to be healthy, then initialize
python scripts/init_database.py

# Start FastAPI backend
uvicorn SPARC.api:app --host 0.0.0.0 --port 8000 --reload

# For the React frontend (separate terminal)
cd frontend
npm install
npm run dev

Step 5: Verify Deployment

# Check API health
curl http://localhost:8000/health

# Expected response:
# {"status":"healthy","version":"0.1.0","timestamp":"..."}

Access the services:

Service	URL
REST API	http://localhost:8000
API Documentation (Swagger)	http://localhost:8000/docs
Dashboard (Web UI)	http://localhost:8080

Step 6: Using the Application

Via Dashboard (Web UI)

Open http://localhost:8080
Register a new account or login (default admin: admin / admin)
Navigate to "Analysis" from the sidebar
Enter a company name (e.g., "Intel")
Click "Analyze"

This will:

Query SerpAPI for recent patents
Download and parse patent PDFs
Send patent content to Claude for analysis
Store prompt/response in PostgreSQL (with caching)
Display results in the dashboard

Via REST API

# Analyze single company
curl http://localhost:8000/analyze/Intel

# Batch analyze multiple companies (synchronous)
curl -X POST http://localhost:8000/analyze/batch \
  -H "Content-Type: application/json" \
  -d '{"companies": ["Intel", "AMD", "NVIDIA"], "max_workers": 3}'

# Async batch (for large jobs)
curl -X POST http://localhost:8000/analyze/batch/async \
  -H "Content-Type: application/json" \
  -d '{"companies": ["Intel", "AMD"]}'

# Check job status
curl http://localhost:8000/jobs/{job_id}

# List all jobs
curl http://localhost:8000/jobs

Via Python

from SPARC.analyzer import CompanyAnalyzer

analyzer = CompanyAnalyzer()
result = analyzer.analyze("Intel")
print(result.analysis)

Step 7: View Stored Data

# View analytics (aggregated usage)
python scripts/view_analytics.py

# View stored messages
python scripts/view_messages.py

# Query database directly
docker exec -it sparc-postgres psql -U postgres -d sparc -c \
  "SELECT company_name, analysis_type, token_usage FROM llm_messages ORDER BY timestamp DESC LIMIT 10;"

Architecture Overview

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  Dashboard   │───▶│   FastAPI    │───▶│   Analyzer   │
│  (8501)      │    │   (8000)     │    │              │
└──────────────┘    └──────────────┘    └──────┬───────┘
                                               │
                    ┌──────────────────────────┼──────────────────────────┐
                    │                          │                          │
                    ▼                          ▼                          ▼
           ┌──────────────┐           ┌──────────────┐           ┌──────────────┐
           │   SerpAPI    │           │  OpenRouter  │           │  PostgreSQL  │
           │ (Patents)    │           │  (Claude)    │           │  (Storage)   │
           └──────────────┘           └──────────────┘           └──────────────┘

Component Responsibilities

Component	Purpose
Dashboard	React TypeScript web UI with authentication
FastAPI	REST API with JWT authentication
Analyzer	Orchestrates patent retrieval and LLM analysis
SerpAPI	Retrieves patent data from Google Patents
OpenRouter	Routes requests to Claude for AI analysis
PostgreSQL	Stores prompts, responses, users, and cached results

Environment Variables Reference

Variable	Required	Default	Description
`API_KEY`	Yes	-	SerpAPI key for patent search
`OPENROUTER_API_KEY`	Yes	-	OpenRouter API key for Claude access
`DATABASE_URL`	Yes	-	PostgreSQL connection string
`USE_CACHE`	No	`true`	Check database for cached responses before API calls
`JWT_SECRET`	Yes	-	Secret key for JWT authentication (change in production!)

Database URL Format

postgresql://[user]:[password]@[host]:[port]/[database]

Example:

postgresql://postgres:postgres@localhost:5432/sparc

Docker Compose Services

The docker-compose.yml includes all services needed for production:

Service	Container	Port	Description
`postgres`	sparc-postgres	5432	PostgreSQL database
`init-db`	sparc-init-db	-	One-time database initialization (seeds admin user)
`api`	sparc-api	8000	FastAPI REST API with JWT auth (patent PDFs stored in `patent_data` volume)
`dashboard`	sparc-dashboard	8080	React TypeScript web UI

Common Docker Compose Commands

# Start all services
docker-compose up -d

# Start with rebuild (after code changes)
docker-compose up -d --build

# View logs
docker-compose logs -f

# View specific service logs
docker-compose logs -f api
docker-compose logs -f dashboard

# Stop all services
docker-compose down

# Stop and remove volumes (WARNING: deletes data)
docker-compose down -v

# Restart a specific service
docker-compose restart api

Patent PDF Storage

The SPARC API downloads patent PDFs during analysis and stores them at /app/patents inside the container. These files are used for subsequent single-patent analysis requests and as a local cache to avoid re-downloading. If this directory is not persisted, all downloaded PDFs are lost when the container is recreated.

Docker Compose (default)

The default docker-compose.yml declares a named volume called patent_data that is mounted at /app/patents:

# In the api service:
volumes:
  - patent_data:/app/patents

# At the top-level volumes section:
volumes:
  patent_data:

This means PDFs survive docker compose down and docker compose up cycles. To remove patent data intentionally, run:

docker compose down -v   # WARNING: also removes postgres_data
# or selectively:
docker volume rm sparc_patent_data

If you prefer a bind mount (e.g., for easy host-side access during development), replace the volume with:

volumes:
  - ./patents:/app/patents

Kubernetes

For Kubernetes deployments, create a PersistentVolumeClaim and mount it into the API pod:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: sparc-patent-data
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sparc-api
spec:
  template:
    spec:
      containers:
        - name: api
          volumeMounts:
            - name: patent-data
              mountPath: /app/patents
      volumes:
        - name: patent-data
          persistentVolumeClaim:
            claimName: sparc-patent-data

Adjust the storage size based on expected patent volume. Each patent PDF is typically 1-5 MB.

S3 Object Storage (alternative)

For production deployments that need shared or highly durable storage, set STORAGE_BACKEND=s3 in your .env file. This stores patent PDFs in an S3-compatible bucket (AWS S3 or MinIO) instead of the local filesystem, eliminating the need for a persistent volume. See the S3/MinIO section in .env.example for configuration details.

Troubleshooting

Database Connection Issues

# Check if postgres is running
docker-compose ps

# Check postgres logs
docker-compose logs postgres

# Test database connection
docker exec -it sparc-postgres psql -U postgres -d sparc -c "SELECT 1;"

API Key Issues

# Verify environment variables are set
echo $API_KEY
echo $OPENROUTER_API_KEY

# Test SerpAPI directly
curl "https://serpapi.com/search?engine=google_patents&q=Intel&api_key=$API_KEY"

Port Conflicts

If ports 8000, 8501, or 5432 are in use:

# Find what's using the port
lsof -i :8000

# Or change ports in docker-compose.yml
ports:
  - "8080:8000"  # Use 8080 instead of 8000

Container Issues

# Rebuild containers after code changes
docker-compose build --no-cache

# Remove all containers and start fresh
docker-compose down
docker-compose up -d --build

Viewing Application Logs

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f api
docker-compose logs -f dashboard

Quick Reference

# Docker setup (recommended)
cp .env.example .env
# Edit .env with API keys
docker-compose up -d

# Local development setup
cp .env.example .env
# Edit .env with API keys
docker-compose up -d postgres
python scripts/init_database.py
uvicorn SPARC.api:app --reload &
cd frontend && npm install && npm run dev &

# Check status
curl http://localhost:8000/health
open http://localhost:8080

# View data
python scripts/view_analytics.py
python scripts/view_messages.py

12 KiB Raw Permalink Blame History