docs: reorganize documentation into docs/ directory
Build and Push Docker Image / build-and-push (push) Successful in 1h1m27s

- Move CONTAINER_REGISTRY.md and DATABASE_MODE.md to docs/
- Add comprehensive DEPLOYMENT.md with full deployment instructions
- Update README.md with documentation section linking to docs/
- Keep README.md at root for GitHub visibility

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
2026-03-12 23:51:32 -04:00
parent d4ba13846a
commit 490850d7a6
4 changed files with 446 additions and 0 deletions
+318
View File
@@ -0,0 +1,318 @@
# Database Mode for Testing and Analytics
This document explains how to use SPARC's database mode for storing LLM messages for testing and analytics purposes.
## Overview
SPARC supports two modes of operation:
1. **API Mode** (default): Messages are sent to OpenRouter's API and you receive real LLM responses
2. **Database Mode**: Messages are stored in a PostgreSQL database without making API calls, useful for:
- Testing the application without consuming API credits
- Collecting analytics on message patterns and usage
- Development and debugging
## Setup
### 1. Start the Database
Use docker-compose to start the PostgreSQL database:
```bash
docker-compose up -d postgres
```
This will start a PostgreSQL instance accessible at `localhost:5432`.
### 2. Initialize the Database Schema
Run the initialization script to create the necessary tables:
```bash
python scripts/init_database.py
```
This creates the `llm_messages` table and indexes for efficient querying.
### 3. Configure Environment Variables
Create a `.env` file (or copy from `.env.example`):
```bash
cp .env.example .env
```
Edit `.env` and set:
```env
# For database mode (testing/analytics)
USE_DATABASE=true
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/sparc
# For API mode (production)
USE_DATABASE=false
OPENROUTER_API_KEY=your_openrouter_key_here
```
## Usage
### Running in Database Mode
Set `USE_DATABASE=true` in your `.env` file, then run the application normally:
```bash
python main.py
```
Instead of sending messages to OpenRouter, the application will:
- Store all prompts in the database
- Return a placeholder response
- Log metadata (company name, analysis type, timestamps)
### Running in API Mode
Set `USE_DATABASE=false` in your `.env` file, then run the application normally:
```bash
python main.py
```
The application will send messages to OpenRouter and return real LLM responses.
### Hybrid Mode (Optional)
You can also enable database logging while still using the API by initializing the database client in your code. The `LLMAnalyzer` will automatically log all API calls to the database if a database client is available.
## Viewing Analytics
### View Message Statistics
```bash
python scripts/view_analytics.py
```
Options:
- `--days N`: Analyze messages from the last N days (default: 30)
Example output:
```
SPARC Analytics - Last 30 days
======================================================================
Total Messages: 45
Messages by Company:
nvidia: 25
intel: 12
amd: 8
Messages by Analysis Type:
portfolio: 30
single_patent: 15
======================================================================
```
### View Stored Messages
```bash
python scripts/view_messages.py
```
Options:
- `--company COMPANY`: Filter by company name
- `--type TYPE`: Filter by analysis type (single_patent or portfolio)
- `--limit N`: Maximum number of messages to display (default: 10)
Examples:
```bash
# View last 10 messages
python scripts/view_messages.py
# View all messages for nvidia
python scripts/view_messages.py --company nvidia --limit 100
# View portfolio analyses only
python scripts/view_messages.py --type portfolio
```
## Database Schema
### llm_messages Table
| Column | Type | Description |
|--------|------|-------------|
| id | SERIAL | Primary key |
| timestamp | TIMESTAMP | When the message was created |
| company_name | VARCHAR(255) | Company being analyzed |
| analysis_type | VARCHAR(50) | Type of analysis (single_patent, portfolio) |
| model | VARCHAR(100) | LLM model identifier |
| prompt | TEXT | The full prompt sent to the LLM |
| response | TEXT | The response from the LLM |
| metadata | JSONB | Additional metadata (patent IDs, content length, etc.) |
| token_usage | JSONB | Token usage statistics (when available) |
| created_at | TIMESTAMP | Record creation timestamp |
### Indexes
- `idx_messages_timestamp`: Speeds up time-based queries
- `idx_messages_company`: Speeds up company-specific queries
## Docker Compose
The included `docker-compose.yml` provides:
1. **PostgreSQL Database**:
- Image: `postgres:16-alpine`
- Port: `5432`
- Credentials: postgres/postgres
- Database: sparc
- Persistent storage via volume
2. **Application Container** (optional):
- Builds from Dockerfile
- Connects to PostgreSQL
- Mounts current directory
### Start Services
```bash
# Start just the database
docker-compose up -d postgres
# Start everything
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down
# Stop and remove volumes (WARNING: deletes data)
docker-compose down -v
```
## Toggling Between Modes
You can easily switch between modes by changing the `USE_DATABASE` environment variable:
### Quick Toggle (temporary, for testing)
```bash
# Run in database mode
USE_DATABASE=true python main.py
# Run in API mode
USE_DATABASE=false python main.py
```
### Persistent Toggle
Edit your `.env` file:
```env
# For testing/analytics
USE_DATABASE=true
# For production use
USE_DATABASE=false
```
## Use Cases
### Testing Without API Costs
During development, enable database mode to test the full application flow without consuming API credits:
```bash
USE_DATABASE=true python main.py
```
### Collecting Usage Analytics
Enable database mode in a test environment to collect analytics on:
- Which companies are analyzed most frequently
- Types of analyses performed
- Prompt patterns and lengths
- Usage over time
### Development and Debugging
Database mode is useful for:
- Testing patent parsing logic without API calls
- Debugging the full pipeline end-to-end
- Collecting sample prompts for optimization
- Understanding token usage patterns (when in API mode with logging)
## Troubleshooting
### Connection Refused
If you get "connection refused" errors:
1. Ensure PostgreSQL is running: `docker-compose ps`
2. Check the DATABASE_URL in your `.env` file
3. Wait for the database to be healthy: `docker-compose logs postgres`
### Schema Not Found
If you get "relation does not exist" errors:
1. Run the initialization script: `python scripts/init_database.py`
2. Verify tables were created: `docker-compose exec postgres psql -U postgres -d sparc -c "\dt"`
### Permission Denied
If you get permission errors:
1. Check your DATABASE_URL credentials match docker-compose.yml
2. Ensure the database container is running: `docker-compose up -d postgres`
## Advanced Usage
### Direct Database Access
You can access the database directly using psql:
```bash
docker-compose exec postgres psql -U postgres -d sparc
```
Example queries:
```sql
-- View all messages
SELECT id, company_name, analysis_type, timestamp FROM llm_messages ORDER BY timestamp DESC LIMIT 10;
-- Count messages by company
SELECT company_name, COUNT(*) FROM llm_messages GROUP BY company_name;
-- View recent prompts
SELECT prompt FROM llm_messages ORDER BY timestamp DESC LIMIT 5;
```
### Programmatic Access
You can use the `DatabaseClient` directly in your code:
```python
from SPARC.database import DatabaseClient
from SPARC import config
db = DatabaseClient(config.database_url)
# Get messages
messages = db.get_messages(company_name="nvidia", limit=10)
# Get analytics
analytics = db.get_analytics(days=7)
# Store a custom message
db.store_message(
prompt="test prompt",
response="test response",
company_name="test",
analysis_type="custom"
)
```