docs: reorganize documentation into docs/ directory

- Move CONTAINER_REGISTRY.md and DATABASE_MODE.md to docs/ - Add comprehensive DEPLOYMENT.md with full deployment instructions - Update README.md with documentation section linking to docs/ - Keep README.md at root for GitHub visibility 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-12 23:51:32 -04:00
parent d4ba13846a
commit 490850d7a6
4 changed files with 446 additions and 0 deletions
@@ -0,0 +1,318 @@
+# Database Mode for Testing and Analytics
+
+This document explains how to use SPARC's database mode for storing LLM messages for testing and analytics purposes.
+
+## Overview
+
+SPARC supports two modes of operation:
+
+1. **API Mode** (default): Messages are sent to OpenRouter's API and you receive real LLM responses
+2. **Database Mode**: Messages are stored in a PostgreSQL database without making API calls, useful for:
+   - Testing the application without consuming API credits
+   - Collecting analytics on message patterns and usage
+   - Development and debugging
+
+## Setup
+
+### 1. Start the Database
+
+Use docker-compose to start the PostgreSQL database:
+
+```bash
+docker-compose up -d postgres
+```
+
+This will start a PostgreSQL instance accessible at `localhost:5432`.
+
+### 2. Initialize the Database Schema
+
+Run the initialization script to create the necessary tables:
+
+```bash
+python scripts/init_database.py
+```
+
+This creates the `llm_messages` table and indexes for efficient querying.
+
+### 3. Configure Environment Variables
+
+Create a `.env` file (or copy from `.env.example`):
+
+```bash
+cp .env.example .env
+```
+
+Edit `.env` and set:
+
+```env
+# For database mode (testing/analytics)
+USE_DATABASE=true
+DATABASE_URL=postgresql://postgres:postgres@localhost:5432/sparc
+
+# For API mode (production)
+USE_DATABASE=false
+OPENROUTER_API_KEY=your_openrouter_key_here
+```
+
+## Usage
+
+### Running in Database Mode
+
+Set `USE_DATABASE=true` in your `.env` file, then run the application normally:
+
+```bash
+python main.py
+```
+
+Instead of sending messages to OpenRouter, the application will:
+- Store all prompts in the database
+- Return a placeholder response
+- Log metadata (company name, analysis type, timestamps)
+
+### Running in API Mode
+
+Set `USE_DATABASE=false` in your `.env` file, then run the application normally:
+
+```bash
+python main.py
+```
+
+The application will send messages to OpenRouter and return real LLM responses.
+
+### Hybrid Mode (Optional)
+
+You can also enable database logging while still using the API by initializing the database client in your code. The `LLMAnalyzer` will automatically log all API calls to the database if a database client is available.
+
+## Viewing Analytics
+
+### View Message Statistics
+
+```bash
+python scripts/view_analytics.py
+```
+
+Options:
+- `--days N`: Analyze messages from the last N days (default: 30)
+
+Example output:
+```
+SPARC Analytics - Last 30 days
+======================================================================
+
+Total Messages: 45
+
+Messages by Company:
+  nvidia: 25
+  intel: 12
+  amd: 8
+
+Messages by Analysis Type:
+  portfolio: 30
+  single_patent: 15
+
+======================================================================
+```
+
+### View Stored Messages
+
+```bash
+python scripts/view_messages.py
+```
+
+Options:
+- `--company COMPANY`: Filter by company name
+- `--type TYPE`: Filter by analysis type (single_patent or portfolio)
+- `--limit N`: Maximum number of messages to display (default: 10)
+
+Examples:
+```bash
+# View last 10 messages
+python scripts/view_messages.py
+
+# View all messages for nvidia
+python scripts/view_messages.py --company nvidia --limit 100
+
+# View portfolio analyses only
+python scripts/view_messages.py --type portfolio
+```
+
+## Database Schema
+
+### llm_messages Table
+
+| Column | Type | Description |
+|--------|------|-------------|
+| id | SERIAL | Primary key |
+| timestamp | TIMESTAMP | When the message was created |
+| company_name | VARCHAR(255) | Company being analyzed |
+| analysis_type | VARCHAR(50) | Type of analysis (single_patent, portfolio) |
+| model | VARCHAR(100) | LLM model identifier |
+| prompt | TEXT | The full prompt sent to the LLM |
+| response | TEXT | The response from the LLM |
+| metadata | JSONB | Additional metadata (patent IDs, content length, etc.) |
+| token_usage | JSONB | Token usage statistics (when available) |
+| created_at | TIMESTAMP | Record creation timestamp |
+
+### Indexes
+
+- `idx_messages_timestamp`: Speeds up time-based queries
+- `idx_messages_company`: Speeds up company-specific queries
+
+## Docker Compose
+
+The included `docker-compose.yml` provides:
+
+1. **PostgreSQL Database**:
+   - Image: `postgres:16-alpine`
+   - Port: `5432`
+   - Credentials: postgres/postgres
+   - Database: sparc
+   - Persistent storage via volume
+
+2. **Application Container** (optional):
+   - Builds from Dockerfile
+   - Connects to PostgreSQL
+   - Mounts current directory
+
+### Start Services
+
+```bash
+# Start just the database
+docker-compose up -d postgres
+
+# Start everything
+docker-compose up -d
+
+# View logs
+docker-compose logs -f
+
+# Stop services
+docker-compose down
+
+# Stop and remove volumes (WARNING: deletes data)
+docker-compose down -v
+```
+
+## Toggling Between Modes
+
+You can easily switch between modes by changing the `USE_DATABASE` environment variable:
+
+### Quick Toggle (temporary, for testing)
+
+```bash
+# Run in database mode
+USE_DATABASE=true python main.py
+
+# Run in API mode
+USE_DATABASE=false python main.py
+```
+
+### Persistent Toggle
+
+Edit your `.env` file:
+
+```env
+# For testing/analytics
+USE_DATABASE=true
+
+# For production use
+USE_DATABASE=false
+```
+
+## Use Cases
+
+### Testing Without API Costs
+
+During development, enable database mode to test the full application flow without consuming API credits:
+
+```bash
+USE_DATABASE=true python main.py
+```
+
+### Collecting Usage Analytics
+
+Enable database mode in a test environment to collect analytics on:
+- Which companies are analyzed most frequently
+- Types of analyses performed
+- Prompt patterns and lengths
+- Usage over time
+
+### Development and Debugging
+
+Database mode is useful for:
+- Testing patent parsing logic without API calls
+- Debugging the full pipeline end-to-end
+- Collecting sample prompts for optimization
+- Understanding token usage patterns (when in API mode with logging)
+
+## Troubleshooting
+
+### Connection Refused
+
+If you get "connection refused" errors:
+
+1. Ensure PostgreSQL is running: `docker-compose ps`
+2. Check the DATABASE_URL in your `.env` file
+3. Wait for the database to be healthy: `docker-compose logs postgres`
+
+### Schema Not Found
+
+If you get "relation does not exist" errors:
+
+1. Run the initialization script: `python scripts/init_database.py`
+2. Verify tables were created: `docker-compose exec postgres psql -U postgres -d sparc -c "\dt"`
+
+### Permission Denied
+
+If you get permission errors:
+
+1. Check your DATABASE_URL credentials match docker-compose.yml
+2. Ensure the database container is running: `docker-compose up -d postgres`
+
+## Advanced Usage
+
+### Direct Database Access
+
+You can access the database directly using psql:
+
+```bash
+docker-compose exec postgres psql -U postgres -d sparc
+```
+
+Example queries:
+
+```sql
+-- View all messages
+SELECT id, company_name, analysis_type, timestamp FROM llm_messages ORDER BY timestamp DESC LIMIT 10;
+
+-- Count messages by company
+SELECT company_name, COUNT(*) FROM llm_messages GROUP BY company_name;
+
+-- View recent prompts
+SELECT prompt FROM llm_messages ORDER BY timestamp DESC LIMIT 5;
+```
+
+### Programmatic Access
+
+You can use the `DatabaseClient` directly in your code:
+
+```python
+from SPARC.database import DatabaseClient
+from SPARC import config
+
+db = DatabaseClient(config.database_url)
+
+# Get messages
+messages = db.get_messages(company_name="nvidia", limit=10)
+
+# Get analytics
+analytics = db.get_analytics(days=7)
+
+# Store a custom message
+db.store_message(
+    prompt="test prompt",
+    response="test response",
+    company_name="test",
+    analysis_type="custom"
+)
+```