# Database Mode for Testing and Analytics This document explains how to use SPARC's database mode for storing LLM messages for testing and analytics purposes. ## Overview SPARC supports two modes of operation: 1. **API Mode** (default): Messages are sent to OpenRouter's API and you receive real LLM responses 2. **Database Mode**: Messages are stored in a PostgreSQL database without making API calls, useful for: - Testing the application without consuming API credits - Collecting analytics on message patterns and usage - Development and debugging ## Setup ### 1. Start the Database Use docker-compose to start the PostgreSQL database: ```bash docker-compose up -d postgres ``` This will start a PostgreSQL instance accessible at `localhost:5432`. ### 2. Initialize the Database Schema Run the initialization script to create the necessary tables: ```bash python scripts/init_database.py ``` This creates the `llm_messages` table and indexes for efficient querying. ### 3. Configure Environment Variables Create a `.env` file (or copy from `.env.example`): ```bash cp .env.example .env ``` Edit `.env` and set: ```env # For database mode (testing/analytics) USE_DATABASE=true DATABASE_URL=postgresql://postgres:postgres@localhost:5432/sparc # For API mode (production) USE_DATABASE=false OPENROUTER_API_KEY=your_openrouter_key_here ``` ## Usage ### Running in Database Mode Set `USE_DATABASE=true` in your `.env` file, then run the application normally: ```bash python main.py ``` Instead of sending messages to OpenRouter, the application will: - Store all prompts in the database - Return a placeholder response - Log metadata (company name, analysis type, timestamps) ### Running in API Mode Set `USE_DATABASE=false` in your `.env` file, then run the application normally: ```bash python main.py ``` The application will send messages to OpenRouter and return real LLM responses. ### Hybrid Mode (Optional) You can also enable database logging while still using the API by initializing the database client in your code. The `LLMAnalyzer` will automatically log all API calls to the database if a database client is available. ## Viewing Analytics ### View Message Statistics ```bash python scripts/view_analytics.py ``` Options: - `--days N`: Analyze messages from the last N days (default: 30) Example output: ``` SPARC Analytics - Last 30 days ====================================================================== Total Messages: 45 Messages by Company: nvidia: 25 intel: 12 amd: 8 Messages by Analysis Type: portfolio: 30 single_patent: 15 ====================================================================== ``` ### View Stored Messages ```bash python scripts/view_messages.py ``` Options: - `--company COMPANY`: Filter by company name - `--type TYPE`: Filter by analysis type (single_patent or portfolio) - `--limit N`: Maximum number of messages to display (default: 10) Examples: ```bash # View last 10 messages python scripts/view_messages.py # View all messages for nvidia python scripts/view_messages.py --company nvidia --limit 100 # View portfolio analyses only python scripts/view_messages.py --type portfolio ``` ## Database Schema ### llm_messages Table | Column | Type | Description | |--------|------|-------------| | id | SERIAL | Primary key | | timestamp | TIMESTAMP | When the message was created | | company_name | VARCHAR(255) | Company being analyzed | | analysis_type | VARCHAR(50) | Type of analysis (single_patent, portfolio) | | model | VARCHAR(100) | LLM model identifier | | prompt | TEXT | The full prompt sent to the LLM | | response | TEXT | The response from the LLM | | metadata | JSONB | Additional metadata (patent IDs, content length, etc.) | | token_usage | JSONB | Token usage statistics (when available) | | created_at | TIMESTAMP | Record creation timestamp | ### Indexes - `idx_messages_timestamp`: Speeds up time-based queries - `idx_messages_company`: Speeds up company-specific queries ## Docker Compose The included `docker-compose.yml` provides: 1. **PostgreSQL Database**: - Image: `postgres:16-alpine` - Port: `5432` - Credentials: postgres/postgres - Database: sparc - Persistent storage via volume 2. **Application Container** (optional): - Builds from Dockerfile - Connects to PostgreSQL - Mounts current directory ### Start Services ```bash # Start just the database docker-compose up -d postgres # Start everything docker-compose up -d # View logs docker-compose logs -f # Stop services docker-compose down # Stop and remove volumes (WARNING: deletes data) docker-compose down -v ``` ## Toggling Between Modes You can easily switch between modes by changing the `USE_DATABASE` environment variable: ### Quick Toggle (temporary, for testing) ```bash # Run in database mode USE_DATABASE=true python main.py # Run in API mode USE_DATABASE=false python main.py ``` ### Persistent Toggle Edit your `.env` file: ```env # For testing/analytics USE_DATABASE=true # For production use USE_DATABASE=false ``` ## Use Cases ### Testing Without API Costs During development, enable database mode to test the full application flow without consuming API credits: ```bash USE_DATABASE=true python main.py ``` ### Collecting Usage Analytics Enable database mode in a test environment to collect analytics on: - Which companies are analyzed most frequently - Types of analyses performed - Prompt patterns and lengths - Usage over time ### Development and Debugging Database mode is useful for: - Testing patent parsing logic without API calls - Debugging the full pipeline end-to-end - Collecting sample prompts for optimization - Understanding token usage patterns (when in API mode with logging) ## Troubleshooting ### Connection Refused If you get "connection refused" errors: 1. Ensure PostgreSQL is running: `docker-compose ps` 2. Check the DATABASE_URL in your `.env` file 3. Wait for the database to be healthy: `docker-compose logs postgres` ### Schema Not Found If you get "relation does not exist" errors: 1. Run the initialization script: `python scripts/init_database.py` 2. Verify tables were created: `docker-compose exec postgres psql -U postgres -d sparc -c "\dt"` ### Permission Denied If you get permission errors: 1. Check your DATABASE_URL credentials match docker-compose.yml 2. Ensure the database container is running: `docker-compose up -d postgres` ## Advanced Usage ### Direct Database Access You can access the database directly using psql: ```bash docker-compose exec postgres psql -U postgres -d sparc ``` Example queries: ```sql -- View all messages SELECT id, company_name, analysis_type, timestamp FROM llm_messages ORDER BY timestamp DESC LIMIT 10; -- Count messages by company SELECT company_name, COUNT(*) FROM llm_messages GROUP BY company_name; -- View recent prompts SELECT prompt FROM llm_messages ORDER BY timestamp DESC LIMIT 5; ``` ### Programmatic Access You can use the `DatabaseClient` directly in your code: ```python from SPARC.database import DatabaseClient from SPARC import config db = DatabaseClient(config.database_url) # Get messages messages = db.get_messages(company_name="nvidia", limit=10) # Get analytics analytics = db.get_analytics(days=7) # Store a custom message db.store_message( prompt="test prompt", response="test response", company_name="test", analysis_type="custom" ) ```