feat: add S3/MinIO object storage support for patent PDFs

Introduce a StorageBackend abstraction (local filesystem and S3) for
patent PDF storage. When STORAGE_BACKEND=s3, PDFs are read/written via
boto3 to an S3-compatible bucket instead of the local filesystem.

- Add SPARC/storage.py with LocalStorageBackend and S3StorageBackend
- Update serp_api.py save_patents and parse_patent_pdf to use storage
- Add storage config vars to config.py and .env.example
- Add optional MinIO service to docker-compose.yml (--profile s3)
- Add boto3 to requirements.txt

Closes leeworks-agents/SPARC#38

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
agent-company
2026-03-26 10:17:24 +00:00
parent 55c131cb32
commit 9a43f85259
6 changed files with 258 additions and 12 deletions
+7
View File
@@ -53,6 +53,13 @@ root_path = os.getenv("ROOT_PATH", "")
# Used for safety checks (e.g., refusing default JWT secret in production)
app_env = os.getenv("APP_ENV", "development")
# Storage backend: "local" (default) or "s3" for S3/MinIO object storage
storage_backend = os.getenv("STORAGE_BACKEND", "local")
s3_bucket = os.getenv("S3_BUCKET", "sparc-patents")
s3_endpoint_url = os.getenv("S3_ENDPOINT_URL", "")
s3_access_key = os.getenv("AWS_ACCESS_KEY_ID", "")
s3_secret_key = os.getenv("AWS_SECRET_ACCESS_KEY", "")
# CORS allowed origins (comma-separated)
# Defaults to localhost dev origins when unset
_cors_origins_raw = os.getenv("CORS_ORIGINS", "")