SPARC/docs/DEPLOYMENT.md

# SPARC Complete Deployment Guide

This guide provides step-by-step instructions for deploying the SPARC (Semiconductor Patent & Analytics Report Core) application with all features enabled, including SERP API patent retrieval, LLM analysis, database storage, and the web UI.

## Table of Contents

- [Prerequisites](#prerequisites)
- [Step 1: Clone and Configure](#step-1-clone-and-configure)
- [Step 2: Start Services with Docker Compose](#step-2-start-services-with-docker-compose)
- [Step 3: Initialize the Database](#step-3-initialize-the-database)
- [Step 4: Run the Services](#step-4-run-the-services)
- [Step 5: Verify Deployment](#step-5-verify-deployment)
- [Step 6: Using the Application](#step-6-using-the-application)
- [Step 7: View Stored Data](#step-7-view-stored-data)
- [Architecture Overview](#architecture-overview)
- [Environment Variables Reference](#environment-variables-reference)
- [Production Docker Compose](#production-docker-compose)
- [Troubleshooting](#troubleshooting)

---

## Prerequisites

1. **Docker & Docker Compose** installed
2. **API Keys** (you'll need to obtain these):
   - **SerpAPI Key**: Sign up at https://serpapi.com/ (free tier: 100 searches/month)
   - **OpenRouter API Key**: Sign up at https://openrouter.ai/ (pay-as-you-go)

---

## Step 1: Clone and Configure

```bash
git clone <repository-url>
cd SPARC

# Create environment file
cp .env.example .env
```

Edit `.env` with your API keys:

```env
# Required API Keys
API_KEY=your_serpapi_key_here
OPENROUTER_API_KEY=your_openrouter_key_here

# Database Configuration (matches docker-compose.yml)
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/sparc
USE_DATABASE=true
```

---

## Step 2: Start Services with Docker Compose

```bash
# Start all services (PostgreSQL, API, and Dashboard)
docker-compose up -d

# Check status
docker-compose ps

# You should see:
# - sparc-postgres (healthy)
# - sparc-api (running on port 8000)
# - sparc-dashboard (running on port 8080)
```

The database is automatically initialized by the `init-db` service.

---

## Step 3: Database Schema

The `init-db` service automatically creates the `llm_messages` table with the following schema:

| Column | Type | Purpose |
|--------|------|---------|
| `id` | SERIAL | Primary key |
| `timestamp` | TIMESTAMP | Message creation time |
| `company_name` | VARCHAR(255) | Company being analyzed |
| `analysis_type` | VARCHAR(50) | 'single_patent' or 'portfolio' |
| `model` | VARCHAR(100) | LLM model identifier |
| `prompt` | TEXT | Full prompt sent to LLM |
| `response` | TEXT | LLM response |
| `metadata` | JSONB | Patent IDs, content lengths |
| `token_usage` | JSONB | prompt/completion/total tokens |
| `created_at` | TIMESTAMP | Record timestamp |

---

## Step 4: Run the Services

### Option A: Run with Docker Compose (Recommended)

All services are started automatically with `docker-compose up -d` from Step 2.

```bash
# View logs
docker-compose logs -f

# View specific service logs
docker-compose logs -f api
docker-compose logs -f dashboard
```

### Option B: Run Locally (Development)

If you prefer running services locally without Docker:

```bash
# Start PostgreSQL with Docker
docker-compose up -d postgres

# Wait for database to be healthy, then initialize
python scripts/init_database.py

# Start FastAPI backend
uvicorn SPARC.api:app --host 0.0.0.0 --port 8000 --reload

# For the React frontend (separate terminal)
cd frontend
npm install
npm run dev
```

---

## Step 5: Verify Deployment

```bash
# Check API health
curl http://localhost:8000/health

# Expected response:
# {"status":"healthy","version":"0.1.0","timestamp":"..."}
```

Access the services:

| Service | URL |
|---------|-----|
| REST API | http://localhost:8000 |
| API Documentation (Swagger) | http://localhost:8000/docs |
| Dashboard (Web UI) | http://localhost:8080 |

---

## Step 6: Using the Application

### Via Dashboard (Web UI)

1. Open http://localhost:8080
2. Register a new account or login (default admin: `admin` / `admin`)
3. Navigate to **"Analysis"** from the sidebar
4. Enter a company name (e.g., "Intel")
5. Click **"Analyze"**

This will:
- Query SerpAPI for recent patents
- Download and parse patent PDFs
- Send patent content to Claude for analysis
- Store prompt/response in PostgreSQL (with caching)
- Display results in the dashboard

### Via REST API

```bash
# Analyze single company
curl http://localhost:8000/analyze/Intel

# Batch analyze multiple companies (synchronous)
curl -X POST http://localhost:8000/analyze/batch \
  -H "Content-Type: application/json" \
  -d '{"companies": ["Intel", "AMD", "NVIDIA"], "max_workers": 3}'

# Async batch (for large jobs)
curl -X POST http://localhost:8000/analyze/batch/async \
  -H "Content-Type: application/json" \
  -d '{"companies": ["Intel", "AMD"]}'

# Check job status
curl http://localhost:8000/jobs/{job_id}

# List all jobs
curl http://localhost:8000/jobs
```

### Via Python

```python
from SPARC.analyzer import CompanyAnalyzer

analyzer = CompanyAnalyzer()
result = analyzer.analyze("Intel")
print(result.analysis)
```

---

## Step 7: View Stored Data

```bash
# View analytics (aggregated usage)
python scripts/view_analytics.py

# View stored messages
python scripts/view_messages.py

# Query database directly
docker exec -it sparc-postgres psql -U postgres -d sparc -c \
  "SELECT company_name, analysis_type, token_usage FROM llm_messages ORDER BY timestamp DESC LIMIT 10;"
```

---

## Architecture Overview

```
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  Dashboard   │───▶│   FastAPI    │───▶│   Analyzer   │
│  (8501)      │    │   (8000)     │    │              │
└──────────────┘    └──────────────┘    └──────┬───────┘
                                               │
                    ┌──────────────────────────┼──────────────────────────┐
                    │                          │                          │
                    ▼                          ▼                          ▼
           ┌──────────────┐           ┌──────────────┐           ┌──────────────┐
           │   SerpAPI    │           │  OpenRouter  │           │  PostgreSQL  │
           │ (Patents)    │           │  (Claude)    │           │  (Storage)   │
           └──────────────┘           └──────────────┘           └──────────────┘
```

### Component Responsibilities

| Component | Purpose |
|-----------|---------|
| **Dashboard** | React TypeScript web UI with authentication |
| **FastAPI** | REST API with JWT authentication |
| **Analyzer** | Orchestrates patent retrieval and LLM analysis |
| **SerpAPI** | Retrieves patent data from Google Patents |
| **OpenRouter** | Routes requests to Claude for AI analysis |
| **PostgreSQL** | Stores prompts, responses, users, and cached results |

---

## Environment Variables Reference

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `API_KEY` | Yes | - | SerpAPI key for patent search |
| `OPENROUTER_API_KEY` | Yes | - | OpenRouter API key for Claude access |
| `DATABASE_URL` | Yes | - | PostgreSQL connection string |
| `USE_CACHE` | No | `true` | Check database for cached responses before API calls |
| `JWT_SECRET` | Yes | - | Secret key for JWT authentication (change in production!) |

### Database URL Format

```
postgresql://[user]:[password]@[host]:[port]/[database]
```

Example:
```
postgresql://postgres:postgres@localhost:5432/sparc
```

---

## Docker Compose Services

The `docker-compose.yml` includes all services needed for production:

| Service | Container | Port | Description |
|---------|-----------|------|-------------|
| `postgres` | sparc-postgres | 5432 | PostgreSQL database |
| `init-db` | sparc-init-db | - | One-time database initialization (seeds admin user) |
| `api` | sparc-api | 8000 | FastAPI REST API with JWT auth (patent PDFs stored in `patent_data` volume) |
| `dashboard` | sparc-dashboard | 8080 | React TypeScript web UI |

### Common Docker Compose Commands

```bash
# Start all services
docker-compose up -d

# Start with rebuild (after code changes)
docker-compose up -d --build

# View logs
docker-compose logs -f

# View specific service logs
docker-compose logs -f api
docker-compose logs -f dashboard

# Stop all services
docker-compose down

# Stop and remove volumes (WARNING: deletes data)
docker-compose down -v

# Restart a specific service
docker-compose restart api
```

---

## Patent PDF Storage

The SPARC API downloads patent PDFs during analysis and stores them at `/app/patents` inside the container. These files are used for subsequent single-patent analysis requests and as a local cache to avoid re-downloading. If this directory is not persisted, all downloaded PDFs are lost when the container is recreated.

### Docker Compose (default)

The default `docker-compose.yml` declares a named volume called `patent_data` that is mounted at `/app/patents`:

```yaml
# In the api service:
volumes:
  - patent_data:/app/patents

# At the top-level volumes section:
volumes:
  patent_data:
```

This means PDFs survive `docker compose down` and `docker compose up` cycles. To remove patent data intentionally, run:

```bash
docker compose down -v   # WARNING: also removes postgres_data
# or selectively:
docker volume rm sparc_patent_data
```

If you prefer a bind mount (e.g., for easy host-side access during development), replace the volume with:

```yaml
volumes:
  - ./patents:/app/patents
```

### Kubernetes

For Kubernetes deployments, create a PersistentVolumeClaim and mount it into the API pod:

```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: sparc-patent-data
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sparc-api
spec:
  template:
    spec:
      containers:
        - name: api
          volumeMounts:
            - name: patent-data
              mountPath: /app/patents
      volumes:
        - name: patent-data
          persistentVolumeClaim:
            claimName: sparc-patent-data
```

Adjust the storage size based on expected patent volume. Each patent PDF is typically 1-5 MB.

### S3 Object Storage (alternative)

For production deployments that need shared or highly durable storage, set `STORAGE_BACKEND=s3` in your `.env` file. This stores patent PDFs in an S3-compatible bucket (AWS S3 or MinIO) instead of the local filesystem, eliminating the need for a persistent volume. See the S3/MinIO section in `.env.example` for configuration details.

---

## Troubleshooting

### Database Connection Issues

```bash
# Check if postgres is running
docker-compose ps

# Check postgres logs
docker-compose logs postgres

# Test database connection
docker exec -it sparc-postgres psql -U postgres -d sparc -c "SELECT 1;"
```

### API Key Issues

```bash
# Verify environment variables are set
echo $API_KEY
echo $OPENROUTER_API_KEY

# Test SerpAPI directly
curl "https://serpapi.com/search?engine=google_patents&q=Intel&api_key=$API_KEY"
```

### Port Conflicts

If ports 8000, 8501, or 5432 are in use:

```bash
# Find what's using the port
lsof -i :8000

# Or change ports in docker-compose.yml
ports:
  - "8080:8000"  # Use 8080 instead of 8000
```

### Container Issues

```bash
# Rebuild containers after code changes
docker-compose build --no-cache

# Remove all containers and start fresh
docker-compose down
docker-compose up -d --build
```

### Viewing Application Logs

```bash
# All services
docker-compose logs -f

# Specific service
docker-compose logs -f api
docker-compose logs -f dashboard
```

---

## Quick Reference

```bash
# Docker setup (recommended)
cp .env.example .env
# Edit .env with API keys
docker-compose up -d

# Local development setup
cp .env.example .env
# Edit .env with API keys
docker-compose up -d postgres
python scripts/init_database.py
uvicorn SPARC.api:app --reload &
cd frontend && npm install && npm run dev &

# Check status
curl http://localhost:8000/health
open http://localhost:8080

# View data
python scripts/view_analytics.py
python scripts/view_messages.py
```