feat: add database mode for LLM message storage and analytics
Implements a database mode that stores LLM prompts and responses in PostgreSQL instead of making API calls. This enables: - Testing without consuming API credits - Collecting analytics on usage patterns - Development and debugging workflows Changes: - Added DatabaseClient class for PostgreSQL operations - Modified LLMAnalyzer to support database/API mode toggle - Added USE_DATABASE config flag to switch between modes - Included Docker Compose setup for PostgreSQL - Added utility scripts for database init and analytics viewing - Comprehensive documentation in DATABASE_MODE.md 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,318 @@
|
||||
# Database Mode for Testing and Analytics
|
||||
|
||||
This document explains how to use SPARC's database mode for storing LLM messages for testing and analytics purposes.
|
||||
|
||||
## Overview
|
||||
|
||||
SPARC supports two modes of operation:
|
||||
|
||||
1. **API Mode** (default): Messages are sent to OpenRouter's API and you receive real LLM responses
|
||||
2. **Database Mode**: Messages are stored in a PostgreSQL database without making API calls, useful for:
|
||||
- Testing the application without consuming API credits
|
||||
- Collecting analytics on message patterns and usage
|
||||
- Development and debugging
|
||||
|
||||
## Setup
|
||||
|
||||
### 1. Start the Database
|
||||
|
||||
Use docker-compose to start the PostgreSQL database:
|
||||
|
||||
```bash
|
||||
docker-compose up -d postgres
|
||||
```
|
||||
|
||||
This will start a PostgreSQL instance accessible at `localhost:5432`.
|
||||
|
||||
### 2. Initialize the Database Schema
|
||||
|
||||
Run the initialization script to create the necessary tables:
|
||||
|
||||
```bash
|
||||
python scripts/init_database.py
|
||||
```
|
||||
|
||||
This creates the `llm_messages` table and indexes for efficient querying.
|
||||
|
||||
### 3. Configure Environment Variables
|
||||
|
||||
Create a `.env` file (or copy from `.env.example`):
|
||||
|
||||
```bash
|
||||
cp .env.example .env
|
||||
```
|
||||
|
||||
Edit `.env` and set:
|
||||
|
||||
```env
|
||||
# For database mode (testing/analytics)
|
||||
USE_DATABASE=true
|
||||
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/sparc
|
||||
|
||||
# For API mode (production)
|
||||
USE_DATABASE=false
|
||||
OPENROUTER_API_KEY=your_openrouter_key_here
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Running in Database Mode
|
||||
|
||||
Set `USE_DATABASE=true` in your `.env` file, then run the application normally:
|
||||
|
||||
```bash
|
||||
python main.py
|
||||
```
|
||||
|
||||
Instead of sending messages to OpenRouter, the application will:
|
||||
- Store all prompts in the database
|
||||
- Return a placeholder response
|
||||
- Log metadata (company name, analysis type, timestamps)
|
||||
|
||||
### Running in API Mode
|
||||
|
||||
Set `USE_DATABASE=false` in your `.env` file, then run the application normally:
|
||||
|
||||
```bash
|
||||
python main.py
|
||||
```
|
||||
|
||||
The application will send messages to OpenRouter and return real LLM responses.
|
||||
|
||||
### Hybrid Mode (Optional)
|
||||
|
||||
You can also enable database logging while still using the API by initializing the database client in your code. The `LLMAnalyzer` will automatically log all API calls to the database if a database client is available.
|
||||
|
||||
## Viewing Analytics
|
||||
|
||||
### View Message Statistics
|
||||
|
||||
```bash
|
||||
python scripts/view_analytics.py
|
||||
```
|
||||
|
||||
Options:
|
||||
- `--days N`: Analyze messages from the last N days (default: 30)
|
||||
|
||||
Example output:
|
||||
```
|
||||
SPARC Analytics - Last 30 days
|
||||
======================================================================
|
||||
|
||||
Total Messages: 45
|
||||
|
||||
Messages by Company:
|
||||
nvidia: 25
|
||||
intel: 12
|
||||
amd: 8
|
||||
|
||||
Messages by Analysis Type:
|
||||
portfolio: 30
|
||||
single_patent: 15
|
||||
|
||||
======================================================================
|
||||
```
|
||||
|
||||
### View Stored Messages
|
||||
|
||||
```bash
|
||||
python scripts/view_messages.py
|
||||
```
|
||||
|
||||
Options:
|
||||
- `--company COMPANY`: Filter by company name
|
||||
- `--type TYPE`: Filter by analysis type (single_patent or portfolio)
|
||||
- `--limit N`: Maximum number of messages to display (default: 10)
|
||||
|
||||
Examples:
|
||||
```bash
|
||||
# View last 10 messages
|
||||
python scripts/view_messages.py
|
||||
|
||||
# View all messages for nvidia
|
||||
python scripts/view_messages.py --company nvidia --limit 100
|
||||
|
||||
# View portfolio analyses only
|
||||
python scripts/view_messages.py --type portfolio
|
||||
```
|
||||
|
||||
## Database Schema
|
||||
|
||||
### llm_messages Table
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| id | SERIAL | Primary key |
|
||||
| timestamp | TIMESTAMP | When the message was created |
|
||||
| company_name | VARCHAR(255) | Company being analyzed |
|
||||
| analysis_type | VARCHAR(50) | Type of analysis (single_patent, portfolio) |
|
||||
| model | VARCHAR(100) | LLM model identifier |
|
||||
| prompt | TEXT | The full prompt sent to the LLM |
|
||||
| response | TEXT | The response from the LLM |
|
||||
| metadata | JSONB | Additional metadata (patent IDs, content length, etc.) |
|
||||
| token_usage | JSONB | Token usage statistics (when available) |
|
||||
| created_at | TIMESTAMP | Record creation timestamp |
|
||||
|
||||
### Indexes
|
||||
|
||||
- `idx_messages_timestamp`: Speeds up time-based queries
|
||||
- `idx_messages_company`: Speeds up company-specific queries
|
||||
|
||||
## Docker Compose
|
||||
|
||||
The included `docker-compose.yml` provides:
|
||||
|
||||
1. **PostgreSQL Database**:
|
||||
- Image: `postgres:16-alpine`
|
||||
- Port: `5432`
|
||||
- Credentials: postgres/postgres
|
||||
- Database: sparc
|
||||
- Persistent storage via volume
|
||||
|
||||
2. **Application Container** (optional):
|
||||
- Builds from Dockerfile
|
||||
- Connects to PostgreSQL
|
||||
- Mounts current directory
|
||||
|
||||
### Start Services
|
||||
|
||||
```bash
|
||||
# Start just the database
|
||||
docker-compose up -d postgres
|
||||
|
||||
# Start everything
|
||||
docker-compose up -d
|
||||
|
||||
# View logs
|
||||
docker-compose logs -f
|
||||
|
||||
# Stop services
|
||||
docker-compose down
|
||||
|
||||
# Stop and remove volumes (WARNING: deletes data)
|
||||
docker-compose down -v
|
||||
```
|
||||
|
||||
## Toggling Between Modes
|
||||
|
||||
You can easily switch between modes by changing the `USE_DATABASE` environment variable:
|
||||
|
||||
### Quick Toggle (temporary, for testing)
|
||||
|
||||
```bash
|
||||
# Run in database mode
|
||||
USE_DATABASE=true python main.py
|
||||
|
||||
# Run in API mode
|
||||
USE_DATABASE=false python main.py
|
||||
```
|
||||
|
||||
### Persistent Toggle
|
||||
|
||||
Edit your `.env` file:
|
||||
|
||||
```env
|
||||
# For testing/analytics
|
||||
USE_DATABASE=true
|
||||
|
||||
# For production use
|
||||
USE_DATABASE=false
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Testing Without API Costs
|
||||
|
||||
During development, enable database mode to test the full application flow without consuming API credits:
|
||||
|
||||
```bash
|
||||
USE_DATABASE=true python main.py
|
||||
```
|
||||
|
||||
### Collecting Usage Analytics
|
||||
|
||||
Enable database mode in a test environment to collect analytics on:
|
||||
- Which companies are analyzed most frequently
|
||||
- Types of analyses performed
|
||||
- Prompt patterns and lengths
|
||||
- Usage over time
|
||||
|
||||
### Development and Debugging
|
||||
|
||||
Database mode is useful for:
|
||||
- Testing patent parsing logic without API calls
|
||||
- Debugging the full pipeline end-to-end
|
||||
- Collecting sample prompts for optimization
|
||||
- Understanding token usage patterns (when in API mode with logging)
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Connection Refused
|
||||
|
||||
If you get "connection refused" errors:
|
||||
|
||||
1. Ensure PostgreSQL is running: `docker-compose ps`
|
||||
2. Check the DATABASE_URL in your `.env` file
|
||||
3. Wait for the database to be healthy: `docker-compose logs postgres`
|
||||
|
||||
### Schema Not Found
|
||||
|
||||
If you get "relation does not exist" errors:
|
||||
|
||||
1. Run the initialization script: `python scripts/init_database.py`
|
||||
2. Verify tables were created: `docker-compose exec postgres psql -U postgres -d sparc -c "\dt"`
|
||||
|
||||
### Permission Denied
|
||||
|
||||
If you get permission errors:
|
||||
|
||||
1. Check your DATABASE_URL credentials match docker-compose.yml
|
||||
2. Ensure the database container is running: `docker-compose up -d postgres`
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Direct Database Access
|
||||
|
||||
You can access the database directly using psql:
|
||||
|
||||
```bash
|
||||
docker-compose exec postgres psql -U postgres -d sparc
|
||||
```
|
||||
|
||||
Example queries:
|
||||
|
||||
```sql
|
||||
-- View all messages
|
||||
SELECT id, company_name, analysis_type, timestamp FROM llm_messages ORDER BY timestamp DESC LIMIT 10;
|
||||
|
||||
-- Count messages by company
|
||||
SELECT company_name, COUNT(*) FROM llm_messages GROUP BY company_name;
|
||||
|
||||
-- View recent prompts
|
||||
SELECT prompt FROM llm_messages ORDER BY timestamp DESC LIMIT 5;
|
||||
```
|
||||
|
||||
### Programmatic Access
|
||||
|
||||
You can use the `DatabaseClient` directly in your code:
|
||||
|
||||
```python
|
||||
from SPARC.database import DatabaseClient
|
||||
from SPARC import config
|
||||
|
||||
db = DatabaseClient(config.database_url)
|
||||
|
||||
# Get messages
|
||||
messages = db.get_messages(company_name="nvidia", limit=10)
|
||||
|
||||
# Get analytics
|
||||
analytics = db.get_analytics(days=7)
|
||||
|
||||
# Store a custom message
|
||||
db.store_message(
|
||||
prompt="test prompt",
|
||||
response="test response",
|
||||
company_name="test",
|
||||
analysis_type="custom"
|
||||
)
|
||||
```
|
||||
Reference in New Issue
Block a user