# SPARC Complete Deployment Guide This guide provides step-by-step instructions for deploying the SPARC (Semiconductor Patent & Analytics Report Core) application with all features enabled, including SERP API patent retrieval, LLM analysis, database storage, and the web UI. ## Table of Contents - [Prerequisites](#prerequisites) - [Step 1: Clone and Configure](#step-1-clone-and-configure) - [Step 2: Start Services with Docker Compose](#step-2-start-services-with-docker-compose) - [Step 3: Initialize the Database](#step-3-initialize-the-database) - [Step 4: Run the Services](#step-4-run-the-services) - [Step 5: Verify Deployment](#step-5-verify-deployment) - [Step 6: Using the Application](#step-6-using-the-application) - [Step 7: View Stored Data](#step-7-view-stored-data) - [Architecture Overview](#architecture-overview) - [Environment Variables Reference](#environment-variables-reference) - [Production Docker Compose](#production-docker-compose) - [Troubleshooting](#troubleshooting) --- ## Prerequisites 1. **Docker & Docker Compose** installed 2. **API Keys** (you'll need to obtain these): - **SerpAPI Key**: Sign up at https://serpapi.com/ (free tier: 100 searches/month) - **OpenRouter API Key**: Sign up at https://openrouter.ai/ (pay-as-you-go) --- ## Step 1: Clone and Configure ```bash git clone cd SPARC # Create environment file cp .env.example .env ``` Edit `.env` with your API keys: ```env # Required API Keys API_KEY=your_serpapi_key_here OPENROUTER_API_KEY=your_openrouter_key_here # Database Configuration (matches docker-compose.yml) DATABASE_URL=postgresql://postgres:postgres@localhost:5432/sparc USE_DATABASE=true ``` --- ## Step 2: Start Services with Docker Compose ```bash # Start all services (PostgreSQL, API, and Dashboard) docker-compose up -d # Check status docker-compose ps # You should see: # - sparc-postgres (healthy) # - sparc-api (running on port 8000) # - sparc-dashboard (running on port 8080) ``` The database is automatically initialized by the `init-db` service. --- ## Step 3: Database Schema The `init-db` service automatically creates the `llm_messages` table with the following schema: | Column | Type | Purpose | |--------|------|---------| | `id` | SERIAL | Primary key | | `timestamp` | TIMESTAMP | Message creation time | | `company_name` | VARCHAR(255) | Company being analyzed | | `analysis_type` | VARCHAR(50) | 'single_patent' or 'portfolio' | | `model` | VARCHAR(100) | LLM model identifier | | `prompt` | TEXT | Full prompt sent to LLM | | `response` | TEXT | LLM response | | `metadata` | JSONB | Patent IDs, content lengths | | `token_usage` | JSONB | prompt/completion/total tokens | | `created_at` | TIMESTAMP | Record timestamp | --- ## Step 4: Run the Services ### Option A: Run with Docker Compose (Recommended) All services are started automatically with `docker-compose up -d` from Step 2. ```bash # View logs docker-compose logs -f # View specific service logs docker-compose logs -f api docker-compose logs -f dashboard ``` ### Option B: Run Locally (Development) If you prefer running services locally without Docker: ```bash # Start PostgreSQL with Docker docker-compose up -d postgres # Wait for database to be healthy, then initialize python scripts/init_database.py # Start FastAPI backend uvicorn SPARC.api:app --host 0.0.0.0 --port 8000 --reload # For the React frontend (separate terminal) cd frontend npm install npm run dev ``` --- ## Step 5: Verify Deployment ```bash # Check API health curl http://localhost:8000/health # Expected response: # {"status":"healthy","version":"0.1.0","timestamp":"..."} ``` Access the services: | Service | URL | |---------|-----| | REST API | http://localhost:8000 | | API Documentation (Swagger) | http://localhost:8000/docs | | Dashboard (Web UI) | http://localhost:8080 | --- ## Step 6: Using the Application ### Via Dashboard (Web UI) 1. Open http://localhost:8080 2. Register a new account or login (default admin: `admin` / `admin`) 3. Navigate to **"Analysis"** from the sidebar 4. Enter a company name (e.g., "Intel") 5. Click **"Analyze"** This will: - Query SerpAPI for recent patents - Download and parse patent PDFs - Send patent content to Claude for analysis - Store prompt/response in PostgreSQL (with caching) - Display results in the dashboard ### Via REST API ```bash # Analyze single company curl http://localhost:8000/analyze/Intel # Batch analyze multiple companies (synchronous) curl -X POST http://localhost:8000/analyze/batch \ -H "Content-Type: application/json" \ -d '{"companies": ["Intel", "AMD", "NVIDIA"], "max_workers": 3}' # Async batch (for large jobs) curl -X POST http://localhost:8000/analyze/batch/async \ -H "Content-Type: application/json" \ -d '{"companies": ["Intel", "AMD"]}' # Check job status curl http://localhost:8000/jobs/{job_id} # List all jobs curl http://localhost:8000/jobs ``` ### Via Python ```python from SPARC.analyzer import CompanyAnalyzer analyzer = CompanyAnalyzer() result = analyzer.analyze("Intel") print(result.analysis) ``` --- ## Step 7: View Stored Data ```bash # View analytics (aggregated usage) python scripts/view_analytics.py # View stored messages python scripts/view_messages.py # Query database directly docker exec -it sparc-postgres psql -U postgres -d sparc -c \ "SELECT company_name, analysis_type, token_usage FROM llm_messages ORDER BY timestamp DESC LIMIT 10;" ``` --- ## Architecture Overview ``` ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ Dashboard │───▶│ FastAPI │───▶│ Analyzer │ │ (8501) │ │ (8000) │ │ │ └──────────────┘ └──────────────┘ └──────┬───────┘ │ ┌──────────────────────────┼──────────────────────────┐ │ │ │ ▼ ▼ ▼ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ SerpAPI │ │ OpenRouter │ │ PostgreSQL │ │ (Patents) │ │ (Claude) │ │ (Storage) │ └──────────────┘ └──────────────┘ └──────────────┘ ``` ### Component Responsibilities | Component | Purpose | |-----------|---------| | **Dashboard** | React TypeScript web UI with authentication | | **FastAPI** | REST API with JWT authentication | | **Analyzer** | Orchestrates patent retrieval and LLM analysis | | **SerpAPI** | Retrieves patent data from Google Patents | | **OpenRouter** | Routes requests to Claude for AI analysis | | **PostgreSQL** | Stores prompts, responses, users, and cached results | --- ## Environment Variables Reference | Variable | Required | Default | Description | |----------|----------|---------|-------------| | `API_KEY` | Yes | - | SerpAPI key for patent search | | `OPENROUTER_API_KEY` | Yes | - | OpenRouter API key for Claude access | | `DATABASE_URL` | Yes | - | PostgreSQL connection string | | `USE_CACHE` | No | `true` | Check database for cached responses before API calls | | `JWT_SECRET` | Yes | - | Secret key for JWT authentication (change in production!) | ### Database URL Format ``` postgresql://[user]:[password]@[host]:[port]/[database] ``` Example: ``` postgresql://postgres:postgres@localhost:5432/sparc ``` --- ## Docker Compose Services The `docker-compose.yml` includes all services needed for production: | Service | Container | Port | Description | |---------|-----------|------|-------------| | `postgres` | sparc-postgres | 5432 | PostgreSQL database | | `init-db` | sparc-init-db | - | One-time database initialization (seeds admin user) | | `api` | sparc-api | 8000 | FastAPI REST API with JWT auth (patent PDFs stored in `patent_data` volume) | | `dashboard` | sparc-dashboard | 8080 | React TypeScript web UI | ### Common Docker Compose Commands ```bash # Start all services docker-compose up -d # Start with rebuild (after code changes) docker-compose up -d --build # View logs docker-compose logs -f # View specific service logs docker-compose logs -f api docker-compose logs -f dashboard # Stop all services docker-compose down # Stop and remove volumes (WARNING: deletes data) docker-compose down -v # Restart a specific service docker-compose restart api ``` --- ## Patent PDF Storage The SPARC API downloads patent PDFs during analysis and stores them at `/app/patents` inside the container. These files are used for subsequent single-patent analysis requests and as a local cache to avoid re-downloading. If this directory is not persisted, all downloaded PDFs are lost when the container is recreated. ### Docker Compose (default) The default `docker-compose.yml` declares a named volume called `patent_data` that is mounted at `/app/patents`: ```yaml # In the api service: volumes: - patent_data:/app/patents # At the top-level volumes section: volumes: patent_data: ``` This means PDFs survive `docker compose down` and `docker compose up` cycles. To remove patent data intentionally, run: ```bash docker compose down -v # WARNING: also removes postgres_data # or selectively: docker volume rm sparc_patent_data ``` If you prefer a bind mount (e.g., for easy host-side access during development), replace the volume with: ```yaml volumes: - ./patents:/app/patents ``` ### Kubernetes For Kubernetes deployments, create a PersistentVolumeClaim and mount it into the API pod: ```yaml apiVersion: v1 kind: PersistentVolumeClaim metadata: name: sparc-patent-data spec: accessModes: - ReadWriteOnce resources: requests: storage: 5Gi --- apiVersion: apps/v1 kind: Deployment metadata: name: sparc-api spec: template: spec: containers: - name: api volumeMounts: - name: patent-data mountPath: /app/patents volumes: - name: patent-data persistentVolumeClaim: claimName: sparc-patent-data ``` Adjust the storage size based on expected patent volume. Each patent PDF is typically 1-5 MB. ### S3 Object Storage (alternative) For production deployments that need shared or highly durable storage, set `STORAGE_BACKEND=s3` in your `.env` file. This stores patent PDFs in an S3-compatible bucket (AWS S3 or MinIO) instead of the local filesystem, eliminating the need for a persistent volume. See the S3/MinIO section in `.env.example` for configuration details. --- ## Troubleshooting ### Database Connection Issues ```bash # Check if postgres is running docker-compose ps # Check postgres logs docker-compose logs postgres # Test database connection docker exec -it sparc-postgres psql -U postgres -d sparc -c "SELECT 1;" ``` ### API Key Issues ```bash # Verify environment variables are set echo $API_KEY echo $OPENROUTER_API_KEY # Test SerpAPI directly curl "https://serpapi.com/search?engine=google_patents&q=Intel&api_key=$API_KEY" ``` ### Port Conflicts If ports 8000, 8501, or 5432 are in use: ```bash # Find what's using the port lsof -i :8000 # Or change ports in docker-compose.yml ports: - "8080:8000" # Use 8080 instead of 8000 ``` ### Container Issues ```bash # Rebuild containers after code changes docker-compose build --no-cache # Remove all containers and start fresh docker-compose down docker-compose up -d --build ``` ### Viewing Application Logs ```bash # All services docker-compose logs -f # Specific service docker-compose logs -f api docker-compose logs -f dashboard ``` --- ## Quick Reference ```bash # Docker setup (recommended) cp .env.example .env # Edit .env with API keys docker-compose up -d # Local development setup cp .env.example .env # Edit .env with API keys docker-compose up -d postgres python scripts/init_database.py uvicorn SPARC.api:app --reload & cd frontend && npm install && npm run dev & # Check status curl http://localhost:8000/health open http://localhost:8080 # View data python scripts/view_analytics.py python scripts/view_messages.py ```