Add POST /export/batch endpoint for multi-company ZIP download

Implements issue #1674: a new authenticated POST /export/batch endpoint that accepts a list of company names and an optional format (csv or pdf), compiles per-company exports into a ZIP archive using Python's zipfile module, and returns it as a streaming download. Key changes: - Extract `_fetch_company_rows`, `_build_company_csv`, `_build_company_pdf` helpers to eliminate duplication between the single-company endpoints and the new batch endpoint - Refactor `export_company_csv` and `export_company_pdf` to delegate to the new helpers - Add `BatchExportRequest` Pydantic model (companies list + format field) - Add `POST /export/batch` which iterates over companies, skips those with no data, writes per-company files into the ZIP, and always includes a `manifest.json` listing exported and skipped companies - Response header: `Content-Disposition: attachment; filename=sparc-export-<date>.zip` - 17 new tests covering: single company (CSV + PDF), multiple companies, all-missing, unauthenticated, invalid-token, manifest structure, input validation Closes leeworks-agents/SPARC#1674 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Merge pull request 'Add rate limit stats to admin panel' (#1682 ) from feature/1675-rate-limit-admin into main
2026-05-19 15:21:09 +00:00 · 2026-05-19 00:12:56 +00:00 · 2026-05-19 00:12:48 +00:00 · 2026-05-19 00:12:42 +00:00 · 2026-05-19 00:12:34 +00:00 · 2026-05-18 21:49:22 +00:00
6 changed files with 1007 additions and 114 deletions
@@ -81,57 +81,50 @@ Items that have been implemented and merged into main.
 - ~~OpenAPI client generation.~~ TypeScript API client auto-generated from
  FastAPI spec with CI freshness check.
 ### Resilience
 - ~~`_jobs` dict is in-memory only.~~ Database-backed job persistence
  implemented using `db.list_jobs()` and `mark_stale_jobs_failed()`. The
  in-memory `_jobs` dict has been removed.
 ### Test coverage (P1/P2)
 - ~~Export endpoint tests.~~ Tests added for CSV and PDF export endpoints.
 - ~~Tracked company admin endpoint tests.~~ Tests added for `/admin/tracked`
  CRUD endpoints and scheduler integration.
 - ~~Webhook integration tests.~~ Tests added for retry logic, Slack/Discord
  payload format, and multi-URL dispatch.
 - ~~S3/MinIO storage backend tests.~~ Unit tests added for the S3 backend
  (read, write, exists, delete, error handling).
 - ~~`analyze_single_patent` auto-download path tests.~~ Tests added for the
  auto-download fallback (cache lookup, PDF download, FileNotFoundError).
 ### Code quality
 - ~~Scheduler creates its own DatabaseClient.~~ Refactored to use the
  application-level pooled `get_db_client()`.
 ---
 ## P1 -- High Priority
-These items address correctness, reliability, and coverage gaps that should be
+No outstanding P1 items. All previously listed items have been completed and
-resolved before broader production use.
+moved to the Completed section above.
 ### Resilience
 - **`_jobs` dict is in-memory only.** Job state is lost on API restart.
  Persist job status in PostgreSQL or Redis so async batch results survive
  restarts.
 ### Test coverage gaps
 - **Export endpoint tests.** The CSV and PDF export endpoints (`/export/`)
  lack test coverage. Add tests covering auth, success, 404, and edge cases.
  *(Issue #1655)*
 - **Tracked company admin endpoint tests.** The `/admin/tracked` CRUD
  endpoints and scheduler integration lack test coverage. *(Issue #1656)*
 ---
 ## P2 -- Medium Priority
-Improvements to reliability, test coverage, and code quality.
+Improvements to the API surface.
 ### Test coverage
 - **Webhook integration tests.** The retry logic, Slack/Discord payload
  format, and multi-URL dispatch in `webhooks.py` need test coverage.
  *(Issue #1657)*
 - **S3/MinIO storage backend tests.** `storage.py` has local filesystem tests
  but no unit tests for the S3 backend (read, write, exists, delete,
  error handling). *(Issue #1660)*
 - **`analyze_single_patent` auto-download path tests.** The auto-download
  fallback (cache lookup, PDF download, FileNotFoundError) in
  `analyzer.py` lacks test coverage. *(Issue #1661)*
 ### Code quality
 - **Scheduler creates its own DatabaseClient.** `scheduler.py` bypasses the
  application-level pooled client, creating a new connection on every tick.
  Refactor to use `get_db_client()`. *(Issue #1658)*
 ### API improvements
- **API pagination.** The `/analyze/batch` and `/jobs` endpoints could benefit
+- **API pagination.** The `/analyze/batch` endpoint needs cursor-based
-  from cursor-based pagination for large result sets.
+  pagination for large result sets. The `/jobs` endpoint already has cursor
  pagination. *(Issue #1669)*
 - **Request validation improvements.** Add stricter input validation for
  company names (disallow special characters, enforce length limits).
  *(Issue #1670)*
 ---
@@ -12,10 +12,10 @@ from typing import TYPE_CHECKING, Annotated, List
 if TYPE_CHECKING:
    from SPARC.database import DatabaseClient
-from fastapi import BackgroundTasks, Depends, FastAPI, HTTPException, Query, Request
+from fastapi import BackgroundTasks, Depends, FastAPI, HTTPException, Path, Query, Request
 from fastapi.middleware.cors import CORSMiddleware
 from fastapi.responses import JSONResponse, StreamingResponse
-from pydantic import BaseModel, EmailStr, Field
+from pydantic import BaseModel, EmailStr, Field, StringConstraints
 from slowapi import Limiter
 from slowapi.errors import RateLimitExceeded
 from slowapi.util import get_remote_address
@@ -36,6 +36,16 @@ from SPARC.auth import (
 )
 from SPARC.types import BatchAnalysisResult, CompanyAnalysisResult
 # Validated company name type: 2-100 chars, alphanumeric + spaces/hyphens/ampersands/periods only.
 CompanyName = Annotated[
    str,
    StringConstraints(
        min_length=2,
        max_length=100,
        pattern=r"^[a-zA-Z0-9][a-zA-Z0-9 \-&.]*$",
    ),
 ]
 # Pydantic models for API
 class CompanyAnalysisResponse(BaseModel):
@@ -72,7 +82,7 @@ class CompanyAnalysisRequest(BaseModel):
 class BatchAnalysisRequest(BaseModel):
    """Request model for batch company analysis."""
-    companies: list[str] = Field(
+    companies: list[CompanyName] = Field(
        ..., min_length=1, max_length=20, description="List of company names to analyze"
    )
    max_workers: int = Field(
@@ -96,6 +106,24 @@ class JobStatus(BaseModel):
    error: str | None = None
 class AnalysisRecord(BaseModel):
    """A single stored analysis result."""
    id: int
    company_name: str | None = None
    analysis_type: str | None = None
    model: str | None = None
    response: str | None = None
    timestamp: datetime | None = None
 class PaginatedAnalysisResponse(BaseModel):
    """Paginated response for analysis result listings."""
    items: list[AnalysisRecord]
    next_cursor: str | None = None
 class PaginatedJobsResponse(BaseModel):
    """Paginated response for job listings."""
@@ -434,7 +462,7 @@ async def delete_user(
 class TrackCompanyRequest(BaseModel):
    """Request to add a company to tracking."""
-    company_name: str = Field(..., min_length=1, max_length=255)
+    company_name: CompanyName = Field(...)
@app.get("/admin/tracked", tags=["Admin"])
@@ -461,7 +489,7 @@ async def add_tracked_company(
@app.delete("/admin/tracked/{company_name}", tags=["Admin"])
 async def remove_tracked_company(
-    company_name: str,
+    company_name: Annotated[str, Path(min_length=2, max_length=100, pattern=r"^[a-zA-Z0-9][a-zA-Z0-9 \-&.]*$")],
    _: UserResponse = Depends(get_current_admin),
 ):
    """Remove a company from the tracked list (admin only)."""
@@ -647,27 +675,25 @@ async def get_analytics_trends(
 # ============== Export Endpoints ==============
-@app.get("/export/{company_name}", tags=["Export"])
+class BatchExportRequest(BaseModel):
-async def export_company_csv(
+    """Request model for batch ZIP export of analysis results."""
    company_name: str,
    _: UserResponse = Depends(get_current_user),
 ):
    """Export analysis results for a company as a CSV file.
-    Returns all stored analysis records for the given company, including
+    companies: list[CompanyName] = Field(
-    analysis type, model used, response text, and timestamp.
+        ..., min_length=1, max_length=50, description="List of company names to export"
    )
    format: str = Field(
        default="csv",
        pattern="^(csv|pdf)$",
        description="Export format: 'csv' or 'pdf'",
    )
    Args:
        company_name: Company name to export results for
-    Returns:
+def _fetch_company_rows(db, company_name: str) -> list:
-        CSV file download
+    """Fetch all non-cached analysis rows for *company_name* from the DB.
    Returns a list of tuples: (company_name, analysis_type, model, response, timestamp).
    Returns an empty list when no results exist.
    """
    import csv
    import io
    db = get_db_client()
    # Query all non-cached analysis results for this company
    with db.get_conn() as conn:
        with conn.cursor() as cur:
            cur.execute(
@@ -679,43 +705,24 @@ async def export_company_csv(
                """,
                (company_name,),
            )
-            rows = cur.fetchall()
+            return cur.fetchall()
-    if not rows:
+
-        raise HTTPException(status_code=404, detail=f"No analysis results found for '{company_name}'")
+def _build_company_csv(rows) -> bytes:
    """Render *rows* as CSV bytes."""
    import csv
    import io
    output = io.StringIO()
    writer = csv.writer(output)
    writer.writerow(["company_name", "analysis_type", "model", "analysis", "timestamp"])
    for row in rows:
        writer.writerow(row)
-
+    return output.getvalue().encode("utf-8")
    output.seek(0)
    safe_name = company_name.replace(" ", "_").lower()
    return StreamingResponse(
        iter([output.getvalue()]),
        media_type="text/csv",
        headers={"Content-Disposition": f'attachment; filename="sparc_{safe_name}_export.csv"'},
    )
-@app.get("/export/{company_name}/pdf", tags=["Export"])
+def _build_company_pdf(rows, company_name: str) -> bytes:
-async def export_company_pdf(
+    """Render *rows* as PDF bytes using reportlab."""
    company_name: str,
    _: UserResponse = Depends(get_current_user),
 ):
    """Export analysis results for a company as a formatted PDF report.
    Returns all stored analysis records for the given company, including
    analysis type, model used, response text, and timestamp, formatted
    as a downloadable PDF document.
    Args:
        company_name: Company name to export results for
    Returns:
        PDF file download
    """
    import io
    from reportlab.lib import colors
@@ -730,23 +737,6 @@ async def export_company_pdf(
        TableStyle,
    )
    db = get_db_client()
    with db.get_conn() as conn:
        with conn.cursor() as cur:
            cur.execute(
                """
                SELECT company_name, analysis_type, model, response, timestamp
                FROM llm_messages
                WHERE LOWER(company_name) = LOWER(%s) AND is_cached = FALSE
                ORDER BY timestamp DESC
                """,
                (company_name,),
            )
            rows = cur.fetchall()
    if not rows:
        raise HTTPException(status_code=404, detail=f"No analysis results found for '{company_name}'")
    buffer = io.BytesIO()
    doc = SimpleDocTemplate(
        buffer,
@@ -789,13 +779,11 @@ async def export_company_pdf(
    elements = []
-    # Title and date
+    display_name = rows[0][0]
    display_name = rows[0][0]  # Use the casing from the database
    analysis_date = datetime.now().strftime("%Y-%m-%d")
    elements.append(Paragraph(f"SPARC Analysis Report: {display_name}", title_style))
    elements.append(Paragraph(f"Generated on {analysis_date}", subtitle_style))
    # Summary table
    summary_data = [
        ["Total Analyses", str(len(rows))],
        ["Analysis Types", ", ".join(sorted(set(r[1] for r in rows)))],
@@ -817,7 +805,6 @@ async def export_company_pdf(
    elements.append(summary_table)
    elements.append(Spacer(1, 16))
    # Individual analysis sections
    for i, row in enumerate(rows, 1):
        _, analysis_type, model, response, timestamp = row
        ts_str = timestamp.strftime("%Y-%m-%d %H:%M:%S") if hasattr(timestamp, "strftime") else str(timestamp)
@@ -829,13 +816,11 @@ async def export_company_pdf(
            Paragraph(f"<i>Performed: {ts_str}</i>", body_style)
        )
        # Wrap long response text into paragraphs, escaping XML special chars
        safe_response = (
            response.replace("&", "&amp;")
            .replace("<", "&lt;")
            .replace(">", "&gt;")
        )
        # Split into manageable paragraphs to avoid overflow
        for line in safe_response.split("\n"):
            if line.strip():
                elements.append(Paragraph(line, body_style))
@@ -846,11 +831,133 @@ async def export_company_pdf(
    doc.build(elements)
    buffer.seek(0)
    return buffer.getvalue()
@app.post("/export/batch", tags=["Export"])
 async def export_batch_zip(
    request: BatchExportRequest,
    _: UserResponse = Depends(get_current_user),
 ):
    """Export analysis results for multiple companies as a ZIP archive.
    For each company in the request, fetches all stored analysis records and
    adds a per-company file (CSV or PDF) to the archive. Companies with no
    stored results are skipped; a ``manifest.json`` inside the ZIP lists both
    the exported and skipped companies.
    Args:
        request: List of company names and desired export format ('csv' or 'pdf')
    Returns:
        ZIP archive download containing one file per found company plus a manifest
    """
    import io
    import json
    import zipfile
    db = get_db_client()
    export_date = datetime.now().strftime("%Y-%m-%d")
    fmt = request.format
    exported: list[str] = []
    skipped: list[str] = []
    zip_buffer = io.BytesIO()
    with zipfile.ZipFile(zip_buffer, mode="w", compression=zipfile.ZIP_DEFLATED) as zf:
        for company_name in request.companies:
            rows = _fetch_company_rows(db, company_name)
            if not rows:
                skipped.append(company_name)
                continue
            safe_name = company_name.replace(" ", "_").lower()
            if fmt == "pdf":
                file_bytes = _build_company_pdf(rows, company_name)
                filename = f"{safe_name}-analysis-{export_date}.pdf"
            else:
                file_bytes = _build_company_csv(rows)
                filename = f"sparc_{safe_name}_export.csv"
            zf.writestr(filename, file_bytes)
            exported.append(company_name)
        # Always include a manifest
        manifest = {
            "export_date": export_date,
            "format": fmt,
            "exported": exported,
            "skipped": skipped,
        }
        zf.writestr("manifest.json", json.dumps(manifest, indent=2))
    zip_buffer.seek(0)
    zip_filename = f"sparc-export-{export_date}.zip"
    return StreamingResponse(
        iter([zip_buffer.getvalue()]),
        media_type="application/zip",
        headers={"Content-Disposition": f'attachment; filename="{zip_filename}"'},
    )
@app.get("/export/{company_name}", tags=["Export"])
 async def export_company_csv(
    company_name: Annotated[str, Path(min_length=2, max_length=100, pattern=r"^[a-zA-Z0-9][a-zA-Z0-9 \-&.]*$")],
    _: UserResponse = Depends(get_current_user),
 ):
    """Export analysis results for a company as a CSV file.
    Returns all stored analysis records for the given company, including
    analysis type, model used, response text, and timestamp.
    Args:
        company_name: Company name to export results for
    Returns:
        CSV file download
    """
    db = get_db_client()
    rows = _fetch_company_rows(db, company_name)
    if not rows:
        raise HTTPException(status_code=404, detail=f"No analysis results found for '{company_name}'")
    safe_name = company_name.replace(" ", "_").lower()
    return StreamingResponse(
        iter([_build_company_csv(rows)]),
        media_type="text/csv",
        headers={"Content-Disposition": f'attachment; filename="sparc_{safe_name}_export.csv"'},
    )
@app.get("/export/{company_name}/pdf", tags=["Export"])
 async def export_company_pdf(
    company_name: Annotated[str, Path(min_length=2, max_length=100, pattern=r"^[a-zA-Z0-9][a-zA-Z0-9 \-&.]*$")],
    _: UserResponse = Depends(get_current_user),
 ):
    """Export analysis results for a company as a formatted PDF report.
    Returns all stored analysis records for the given company, including
    analysis type, model used, response text, and timestamp, formatted
    as a downloadable PDF document.
    Args:
        company_name: Company name to export results for
    Returns:
        PDF file download
    """
    db = get_db_client()
    rows = _fetch_company_rows(db, company_name)
    if not rows:
        raise HTTPException(status_code=404, detail=f"No analysis results found for '{company_name}'")
    safe_name = company_name.replace(" ", "_").lower()
    analysis_date = datetime.now().strftime("%Y-%m-%d")
    filename = f"{safe_name}-analysis-{analysis_date}.pdf"
    return StreamingResponse(
-        iter([buffer.getvalue()]),
+        iter([_build_company_pdf(rows, company_name)]),
        media_type="application/pdf",
        headers={"Content-Disposition": f'attachment; filename="{filename}"'},
    )
@@ -875,7 +982,7 @@ async def health_check():
    tags=["Analysis"],
 )
 async def analyze_company(
-    company_name: str,
+    company_name: Annotated[str, Path(min_length=2, max_length=100, pattern=r"^[a-zA-Z0-9][a-zA-Z0-9 \-&.]*$")],
    model: str | None = Query(default=None, description="LLM model to use (e.g. 'openai/gpt-4o'). Defaults to server config."),
    _: UserResponse = Depends(get_current_user),
 ):
@@ -905,7 +1012,7 @@ async def analyze_company(
 )
 async def analyze_single_patent(
    patent_id: str,
-    company_name: str = Query(description="Company name for analysis context"),
+    company_name: Annotated[str, Query(min_length=2, max_length=100, pattern=r"^[a-zA-Z0-9][a-zA-Z0-9 \-&.]*$", description="Company name for analysis context")],
    _: UserResponse = Depends(get_current_user),
 ):
    """Analyze a single patent by its publication ID.
@@ -931,6 +1038,58 @@ async def analyze_single_patent(
        raise HTTPException(status_code=404, detail=str(e))
@app.get(
    "/analyze/batch",
    response_model=PaginatedAnalysisResponse,
    tags=["Analysis"],
 )
 async def list_analysis_results(
    company_name: Annotated[
        str | None,
        Query(description="Filter results by company name"),
    ] = None,
    limit: Annotated[int, Query(ge=1, le=200)] = 50,
    cursor: Annotated[
        str | None,
        Query(description="Opaque cursor from a previous response's next_cursor field"),
    ] = None,
    _: UserResponse = Depends(get_current_user),
 ):
    """List stored analysis results with cursor-based pagination.
    Returns past analysis results ordered by timestamp descending. Use
    ``limit`` to control page size (default 50, max 200). The response
    includes a ``next_cursor`` field; pass it back as the ``cursor`` query
    parameter to fetch the next page. When ``next_cursor`` is ``null``,
    there are no more results.
    Args:
        company_name: Optional filter by company name
        limit: Maximum number of results to return (default 50, max 200)
        cursor: Opaque pagination cursor from a previous response
    Returns:
        Paginated list of analysis results
    """
    db = _get_job_db()
    rows = db.list_analyses(company_name=company_name, limit=limit + 1, cursor=cursor)
    has_next = len(rows) > limit
    if has_next:
        rows = rows[:limit]
    items = [AnalysisRecord(**row) for row in rows]
    next_cursor = None
    if has_next and rows:
        last = rows[-1]
        ts = last["timestamp"]
        ts_str = ts.isoformat() if hasattr(ts, "isoformat") else str(ts)
        next_cursor = f"{ts_str}|{last['id']}"
    return PaginatedAnalysisResponse(items=items, next_cursor=next_cursor)
@app.post(
    "/analyze/batch",
    response_model=BatchAnalysisResponse,
@@ -1106,7 +1265,7 @@ async def list_jobs(
        str | None,
        Query(description="Filter by status: pending, running, completed, failed"),
    ] = None,
-    limit: Annotated[int, Query(ge=1, le=100)] = 10,
+    limit: Annotated[int, Query(ge=1, le=200)] = 50,
    cursor: Annotated[
        str | None,
        Query(description="Opaque cursor from a previous response's next_cursor field"),
@@ -371,6 +371,48 @@ class DatabaseClient:
                cursor.execute(query, params)
                return [dict(row) for row in cursor.fetchall()]
    def list_analyses(
        self,
        company_name: Optional[str] = None,
        limit: int = 50,
        cursor: Optional[str] = None,
    ) -> List[Dict]:
        """List analysis results with cursor-based pagination.
        Args:
            company_name: Optional filter by company name.
            limit: Maximum number of records to return.
            cursor: Opaque cursor (``timestamp|id``) from a previous response.
        Returns:
            List of analysis dicts ordered by timestamp descending.
        """
        conditions: list[str] = ["is_cached = FALSE"]
        params: list = []
        if company_name:
            conditions.append("LOWER(company_name) = LOWER(%s)")
            params.append(company_name)
        if cursor:
            try:
                ts_str, cursor_id = cursor.rsplit("|", 1)
                conditions.append("(timestamp, id) < (%s, %s)")
                params.extend([ts_str, int(cursor_id)])
            except (ValueError, TypeError):
                pass  # Ignore malformed cursors; return from start
        query = "SELECT id, company_name, analysis_type, model, response, timestamp FROM llm_messages"
        if conditions:
            query += " WHERE " + " AND ".join(conditions)
        query += " ORDER BY timestamp DESC, id DESC LIMIT %s"
        params.append(limit)
        with self.get_conn() as conn:
            with conn.cursor(cursor_factory=RealDictCursor) as cur:
                cur.execute(query, params)
                return [dict(row) for row in cur.fetchall()]
    def get_analytics(self, days: int = 30) -> Dict:
        """Get analytics on message usage.
@@ -0,0 +1,373 @@
 """Tests for POST /export/batch endpoint (issue #1674).
 Covers:
 - Single company export (CSV + PDF)
 - Multiple company export
 - All-missing companies (every requested company is skipped)
 - Unauthenticated / invalid-token requests
 - Manifest content validation
 - Invalid format rejection
 """
 import io
 import json
 import zipfile
 from datetime import datetime, timezone
 from unittest.mock import MagicMock, patch
 import pytest
 from fastapi.testclient import TestClient
 from SPARC.api import app
 from SPARC.auth import create_access_token
@pytest.fixture
 def client():
    """Create a FastAPI test client."""
    return TestClient(app)
@pytest.fixture(autouse=True)
 def mock_db():
    """Mock database client for all tests in this module."""
    db = MagicMock()
    # Auth: user always exists
    db.get_user_by_id.return_value = {
        "id": 1,
        "email": "user@test.com",
        "role": "user",
        "created_at": datetime(2025, 1, 1, tzinfo=timezone.utc),
    }
    # Default cursor mock (overridden per-test via side_effect or return_value)
    mock_cursor = MagicMock()
    mock_conn = MagicMock()
    mock_conn.cursor.return_value.__enter__ = MagicMock(return_value=mock_cursor)
    mock_conn.cursor.return_value.__exit__ = MagicMock(return_value=False)
    db.get_conn.return_value.__enter__ = MagicMock(return_value=mock_conn)
    db.get_conn.return_value.__exit__ = MagicMock(return_value=False)
    db._mock_cursor = mock_cursor
    with patch("SPARC.api.get_db_client", return_value=db), \
         patch("SPARC.auth.get_db_client", return_value=db):
        yield db
 def _auth_header():
    token = create_access_token(1, "user@test.com", "user")
    return {"Authorization": f"Bearer {token}"}
 def _rows_for(company_name: str):
    """Return a single sample row for the given company."""
    return [
        (
            company_name,
            "company_analysis",
            "anthropic/claude-3.5-sonnet",
            f"Strong patent portfolio for {company_name}.",
            datetime(2025, 6, 15, 10, 30, 0),
        )
    ]
 def _open_zip(content: bytes) -> zipfile.ZipFile:
    """Helper: wrap response bytes as a ZipFile."""
    return zipfile.ZipFile(io.BytesIO(content))
 # ---------------------------------------------------------------------------
 # Authentication
 # ---------------------------------------------------------------------------
 class TestBatchExportAuth:
    """Unauthenticated and invalid-token requests must be rejected."""
    def test_unauthenticated_returns_401(self, client):
        response = client.post(
            "/export/batch",
            json={"companies": ["NVIDIA"], "format": "csv"},
        )
        assert response.status_code == 401
    def test_invalid_token_returns_401(self, client):
        response = client.post(
            "/export/batch",
            json={"companies": ["NVIDIA"], "format": "csv"},
            headers={"Authorization": "Bearer totally.invalid.token"},
        )
        assert response.status_code == 401
 # ---------------------------------------------------------------------------
 # Single company
 # ---------------------------------------------------------------------------
 class TestBatchExportSingleCompany:
    """POST /export/batch with a single company name."""
    def test_single_company_csv_returns_zip(self, client, mock_db):
        mock_db._mock_cursor.fetchall.return_value = _rows_for("NVIDIA")
        response = client.post(
            "/export/batch",
            json={"companies": ["NVIDIA"], "format": "csv"},
            headers=_auth_header(),
        )
        assert response.status_code == 200
        assert response.headers["content-type"] == "application/zip"
        assert "attachment" in response.headers["content-disposition"]
        assert "sparc-export-" in response.headers["content-disposition"]
        assert response.headers["content-disposition"].endswith('.zip"')
    def test_single_company_csv_zip_contains_csv_file(self, client, mock_db):
        mock_db._mock_cursor.fetchall.return_value = _rows_for("NVIDIA")
        response = client.post(
            "/export/batch",
            json={"companies": ["NVIDIA"], "format": "csv"},
            headers=_auth_header(),
        )
        zf = _open_zip(response.content)
        names = zf.namelist()
        csv_files = [n for n in names if n.endswith(".csv")]
        assert len(csv_files) == 1
        assert "nvidia" in csv_files[0]
    def test_single_company_csv_content_is_valid_csv(self, client, mock_db):
        mock_db._mock_cursor.fetchall.return_value = _rows_for("NVIDIA")
        response = client.post(
            "/export/batch",
            json={"companies": ["NVIDIA"], "format": "csv"},
            headers=_auth_header(),
        )
        zf = _open_zip(response.content)
        csv_name = [n for n in zf.namelist() if n.endswith(".csv")][0]
        csv_text = zf.read(csv_name).decode("utf-8")
        lines = csv_text.strip().split("\n")
        assert lines[0].strip() == "company_name,analysis_type,model,analysis,timestamp"
        assert "NVIDIA" in lines[1]
    def test_single_company_pdf_zip_contains_pdf_file(self, client, mock_db):
        mock_db._mock_cursor.fetchall.return_value = _rows_for("NVIDIA")
        response = client.post(
            "/export/batch",
            json={"companies": ["NVIDIA"], "format": "pdf"},
            headers=_auth_header(),
        )
        assert response.status_code == 200
        zf = _open_zip(response.content)
        pdf_files = [n for n in zf.namelist() if n.endswith(".pdf")]
        assert len(pdf_files) == 1
        # Verify it is actually a PDF (starts with %PDF)
        pdf_bytes = zf.read(pdf_files[0])
        assert pdf_bytes[:4] == b"%PDF"
 # ---------------------------------------------------------------------------
 # Multiple companies
 # ---------------------------------------------------------------------------
 class TestBatchExportMultipleCompanies:
    """POST /export/batch with several companies."""
    def test_multiple_companies_each_gets_a_file(self, client, mock_db):
        companies = ["NVIDIA", "Intel", "AMD"]
        mock_db._mock_cursor.fetchall.side_effect = [
            _rows_for("NVIDIA"),
            _rows_for("Intel"),
            _rows_for("AMD"),
        ]
        response = client.post(
            "/export/batch",
            json={"companies": companies, "format": "csv"},
            headers=_auth_header(),
        )
        assert response.status_code == 200
        zf = _open_zip(response.content)
        csv_files = [n for n in zf.namelist() if n.endswith(".csv")]
        assert len(csv_files) == 3
    def test_multiple_companies_manifest_lists_all_exported(self, client, mock_db):
        companies = ["NVIDIA", "Intel"]
        mock_db._mock_cursor.fetchall.side_effect = [
            _rows_for("NVIDIA"),
            _rows_for("Intel"),
        ]
        response = client.post(
            "/export/batch",
            json={"companies": companies, "format": "csv"},
            headers=_auth_header(),
        )
        zf = _open_zip(response.content)
        manifest = json.loads(zf.read("manifest.json"))
        assert set(manifest["exported"]) == {"NVIDIA", "Intel"}
        assert manifest["skipped"] == []
        assert manifest["format"] == "csv"
    def test_partial_missing_companies_skipped(self, client, mock_db):
        """Companies with no data are skipped; others are exported."""
        mock_db._mock_cursor.fetchall.side_effect = [
            _rows_for("NVIDIA"),
            [],  # no data for "UnknownCo"
        ]
        response = client.post(
            "/export/batch",
            json={"companies": ["NVIDIA", "UnknownCo"], "format": "csv"},
            headers=_auth_header(),
        )
        assert response.status_code == 200
        zf = _open_zip(response.content)
        manifest = json.loads(zf.read("manifest.json"))
        assert manifest["exported"] == ["NVIDIA"]
        assert manifest["skipped"] == ["UnknownCo"]
        csv_files = [n for n in zf.namelist() if n.endswith(".csv")]
        assert len(csv_files) == 1
 # ---------------------------------------------------------------------------
 # All-missing companies
 # ---------------------------------------------------------------------------
 class TestBatchExportAllMissing:
    """When every requested company has no data, the ZIP still returns 200
    with only a manifest (no per-company files, all listed in skipped)."""
    def test_all_missing_returns_200_with_manifest_only(self, client, mock_db):
        mock_db._mock_cursor.fetchall.return_value = []
        response = client.post(
            "/export/batch",
            json={"companies": ["GhostCo", "PhantomInc"], "format": "csv"},
            headers=_auth_header(),
        )
        assert response.status_code == 200
        zf = _open_zip(response.content)
        assert "manifest.json" in zf.namelist()
        manifest = json.loads(zf.read("manifest.json"))
        assert manifest["exported"] == []
        assert set(manifest["skipped"]) == {"GhostCo", "PhantomInc"}
    def test_all_missing_zip_has_no_data_files(self, client, mock_db):
        mock_db._mock_cursor.fetchall.return_value = []
        response = client.post(
            "/export/batch",
            json={"companies": ["GhostCo"], "format": "csv"},
            headers=_auth_header(),
        )
        zf = _open_zip(response.content)
        data_files = [n for n in zf.namelist() if n != "manifest.json"]
        assert data_files == []
 # ---------------------------------------------------------------------------
 # Manifest validation
 # ---------------------------------------------------------------------------
 class TestBatchExportManifest:
    """The manifest.json inside every ZIP must be well-formed."""
    def test_manifest_always_present(self, client, mock_db):
        mock_db._mock_cursor.fetchall.return_value = _rows_for("NVIDIA")
        response = client.post(
            "/export/batch",
            json={"companies": ["NVIDIA"], "format": "csv"},
            headers=_auth_header(),
        )
        zf = _open_zip(response.content)
        assert "manifest.json" in zf.namelist()
    def test_manifest_contains_required_keys(self, client, mock_db):
        mock_db._mock_cursor.fetchall.return_value = _rows_for("NVIDIA")
        response = client.post(
            "/export/batch",
            json={"companies": ["NVIDIA"], "format": "csv"},
            headers=_auth_header(),
        )
        zf = _open_zip(response.content)
        manifest = json.loads(zf.read("manifest.json"))
        assert "export_date" in manifest
        assert "format" in manifest
        assert "exported" in manifest
        assert "skipped" in manifest
    def test_manifest_format_field_matches_request(self, client, mock_db):
        mock_db._mock_cursor.fetchall.return_value = _rows_for("NVIDIA")
        response = client.post(
            "/export/batch",
            json={"companies": ["NVIDIA"], "format": "pdf"},
            headers=_auth_header(),
        )
        zf = _open_zip(response.content)
        manifest = json.loads(zf.read("manifest.json"))
        assert manifest["format"] == "pdf"
 # ---------------------------------------------------------------------------
 # Input validation
 # ---------------------------------------------------------------------------
 class TestBatchExportInputValidation:
    """Invalid request bodies must return 422."""
    def test_invalid_format_returns_422(self, client):
        response = client.post(
            "/export/batch",
            json={"companies": ["NVIDIA"], "format": "xlsx"},
            headers=_auth_header(),
        )
        assert response.status_code == 422
    def test_empty_companies_list_returns_422(self, client):
        response = client.post(
            "/export/batch",
            json={"companies": [], "format": "csv"},
            headers=_auth_header(),
        )
        assert response.status_code == 422
    def test_default_format_is_csv(self, client, mock_db):
        """Omitting `format` should default to CSV."""
        mock_db._mock_cursor.fetchall.return_value = _rows_for("NVIDIA")
        response = client.post(
            "/export/batch",
            json={"companies": ["NVIDIA"]},
            headers=_auth_header(),
        )
        assert response.status_code == 200
        zf = _open_zip(response.content)
        manifest = json.loads(zf.read("manifest.json"))
        assert manifest["format"] == "csv"
@@ -0,0 +1,157 @@
 """Tests for company name input validation on analysis endpoints."""
 from datetime import datetime
 from unittest.mock import Mock
 import pytest
 from fastapi.testclient import TestClient
 from SPARC.api import app
 from SPARC.types import CompanyAnalysisResult
@pytest.fixture
 def client():
    """Create test client."""
    return TestClient(app)
@pytest.fixture
 def mock_analyzer(mocker):
    """Mock the global analyzer so valid requests succeed."""
    mock = Mock()
    mock._analyze_company_safe.return_value = CompanyAnalysisResult(
        company_name="nvidia",
        analysis="Test analysis",
        patent_count=1,
        success=True,
        timestamp=datetime.now(),
    )
    mocker.patch("SPARC.api._analyzer", mock)
    return mock
 class TestCompanyNameValidation:
    """Test that company names are validated on analysis endpoints."""
    # --- Too short ---
    def test_single_char_rejected(self, client, mock_analyzer):
        """A one-character company name should be rejected."""
        response = client.get("/analyze/X")
        assert response.status_code == 422
    # --- Too long ---
    def test_over_100_chars_rejected(self, client, mock_analyzer):
        """A company name longer than 100 characters should be rejected."""
        long_name = "A" * 101
        response = client.get(f"/analyze/{long_name}")
        assert response.status_code == 422
    # --- Special characters ---
    @pytest.mark.parametrize(
        "bad_name",
        [
            "nvidia!",
            "intel@corp",
            "test#company",
            "foo$bar",
            "a%b",
            "x^y",
            "semi;colon",
            "drop'table",
            'say"hello',
            "path/traversal",
            "back\\slash",
            "pipe|char",
            "star*glob",
            "question?mark",
            "<script>",
            "curly{brace}",
            "equal=sign",
            "plus+plus",
            "comma,separated",
        ],
    )
    def test_special_chars_rejected(self, client, mock_analyzer, bad_name):
        """Company names with disallowed special characters should be rejected."""
        response = client.get(f"/analyze/{bad_name}")
        assert response.status_code == 422
    # --- Valid names ---
    @pytest.mark.parametrize(
        "valid_name",
        [
            "nvidia",
            "Intel",
            "TSMC",
            "Texas Instruments",
            "Johnson-Johnson",
            "AT&T",
            "St. Jude Medical",
            "3M",
            "21st Century Fox",
            "ab",  # minimum length
            "A" * 100,  # maximum length
        ],
    )
    def test_valid_names_accepted(self, client, mock_analyzer, valid_name):
        """Valid company names should be accepted (200, not 422)."""
        response = client.get(f"/analyze/{valid_name}")
        # Should not be a validation error; 200 or other non-422 status is fine
        assert response.status_code != 422
    # --- Batch endpoint validation ---
    def test_batch_too_short_rejected(self, client, mock_analyzer):
        """Batch endpoint should reject company names that are too short."""
        response = client.post(
            "/analyze/batch",
            json={"companies": ["X"]},
        )
        assert response.status_code == 422
    def test_batch_too_long_rejected(self, client, mock_analyzer):
        """Batch endpoint should reject company names that are too long."""
        response = client.post(
            "/analyze/batch",
            json={"companies": ["A" * 101]},
        )
        assert response.status_code == 422
    def test_batch_special_chars_rejected(self, client, mock_analyzer):
        """Batch endpoint should reject company names with special chars."""
        response = client.post(
            "/analyze/batch",
            json={"companies": ["nvidia!", "intel"]},
        )
        assert response.status_code == 422
    def test_batch_valid_names_accepted(self, client, mock_analyzer):
        """Batch endpoint should accept valid company names."""
        response = client.post(
            "/analyze/batch",
            json={"companies": ["nvidia", "Intel", "AT&T"]},
        )
        assert response.status_code != 422
    # --- Name must start with alphanumeric ---
    def test_leading_space_rejected(self, client, mock_analyzer):
        """Company name starting with a space should be rejected."""
        response = client.post(
            "/analyze/batch",
            json={"companies": [" nvidia"]},
        )
        assert response.status_code == 422
    def test_leading_hyphen_rejected(self, client, mock_analyzer):
        """Company name starting with a hyphen should be rejected."""
        response = client.post(
            "/analyze/batch",
            json={"companies": ["-nvidia"]},
        )
        assert response.status_code == 422
@@ -0,0 +1,169 @@
 """Tests for cursor-based pagination on /analyze/batch GET and /jobs endpoints."""
 from datetime import datetime, timedelta
 from unittest.mock import Mock, patch
 import pytest
 from fastapi.testclient import TestClient
 from SPARC.api import app
@pytest.fixture
 def client():
    """Create test client."""
    return TestClient(app)
 def _make_analysis_row(id_: int, minutes_ago: int = 0, company: str = "nvidia"):
    """Create a fake analysis row dict."""
    ts = datetime.now() - timedelta(minutes=minutes_ago)
    return {
        "id": id_,
        "company_name": company,
        "analysis_type": "patent_portfolio",
        "model": "openai/gpt-4o",
        "response": f"Analysis for {company}",
        "timestamp": ts,
    }
 def _make_job_row(job_id: str, minutes_ago: int = 0, status: str = "completed"):
    """Create a fake job row dict."""
    ts = datetime.now() - timedelta(minutes=minutes_ago)
    return {
        "job_id": job_id,
        "status": status,
        "progress": 100 if status == "completed" else 0,
        "total_companies": 1,
        "completed_companies": 1 if status == "completed" else 0,
        "result": None,
        "error": None,
        "created_at": ts,
    }
 class TestAnalyzeBatchGetPagination:
    """Test cursor-based pagination on GET /analyze/batch."""
    @patch("SPARC.api._get_job_db")
    def test_returns_items_and_no_cursor_when_less_than_limit(self, mock_get_db, client):
        """When fewer results than limit, next_cursor should be null."""
        db = Mock()
        db.list_analyses.return_value = [
            _make_analysis_row(1, minutes_ago=10),
            _make_analysis_row(2, minutes_ago=20),
        ]
        mock_get_db.return_value = db
        response = client.get("/analyze/batch?limit=10")
        assert response.status_code == 200
        data = response.json()
        assert len(data["items"]) == 2
        assert data["next_cursor"] is None
    @patch("SPARC.api._get_job_db")
    def test_returns_cursor_when_more_results_exist(self, mock_get_db, client):
        """When more results exist than limit, next_cursor should be set."""
        db = Mock()
        # Return limit+1 rows to simulate more data
        rows = [_make_analysis_row(i, minutes_ago=i) for i in range(4)]
        db.list_analyses.return_value = rows
        mock_get_db.return_value = db
        response = client.get("/analyze/batch?limit=3")
        assert response.status_code == 200
        data = response.json()
        assert len(data["items"]) == 3
        assert data["next_cursor"] is not None
    @patch("SPARC.api._get_job_db")
    def test_cursor_passed_to_db(self, mock_get_db, client):
        """The cursor query param should be forwarded to the database layer."""
        db = Mock()
        db.list_analyses.return_value = []
        mock_get_db.return_value = db
        client.get("/analyze/batch?cursor=2025-01-01T00:00:00|42")
        db.list_analyses.assert_called_once()
        call_kwargs = db.list_analyses.call_args
        assert call_kwargs.kwargs.get("cursor") == "2025-01-01T00:00:00|42" or \
            (call_kwargs[1].get("cursor") == "2025-01-01T00:00:00|42" if len(call_kwargs) > 1 else False)
    @patch("SPARC.api._get_job_db")
    def test_default_limit_is_50(self, mock_get_db, client):
        """Default limit should be 50."""
        db = Mock()
        db.list_analyses.return_value = []
        mock_get_db.return_value = db
        client.get("/analyze/batch")
        call_kwargs = db.list_analyses.call_args
        # The endpoint requests limit+1 from DB, so 51
        assert 51 in call_kwargs.args or call_kwargs.kwargs.get("limit") == 51
    def test_limit_over_200_rejected(self, client):
        """Limit > 200 should be rejected with 422."""
        response = client.get("/analyze/batch?limit=201")
        assert response.status_code == 422
    def test_limit_zero_rejected(self, client):
        """Limit < 1 should be rejected with 422."""
        response = client.get("/analyze/batch?limit=0")
        assert response.status_code == 422
    @patch("SPARC.api._get_job_db")
    def test_company_name_filter(self, mock_get_db, client):
        """The company_name filter should be forwarded to the database."""
        db = Mock()
        db.list_analyses.return_value = []
        mock_get_db.return_value = db
        client.get("/analyze/batch?company_name=intel")
        call_kwargs = db.list_analyses.call_args
        assert call_kwargs.kwargs.get("company_name") == "intel" or \
            "intel" in (call_kwargs.args if call_kwargs.args else [])
    @patch("SPARC.api._get_job_db")
    def test_empty_result_set(self, mock_get_db, client):
        """Empty result set returns empty items and null cursor."""
        db = Mock()
        db.list_analyses.return_value = []
        mock_get_db.return_value = db
        response = client.get("/analyze/batch")
        assert response.status_code == 200
        data = response.json()
        assert data["items"] == []
        assert data["next_cursor"] is None
 class TestJobsPaginationDefaults:
    """Test that /jobs endpoint uses updated defaults."""
    @patch("SPARC.api._get_job_db")
    def test_default_limit_is_50(self, mock_get_db, client):
        """Default limit should now be 50."""
        db = Mock()
        db.list_jobs.return_value = []
        mock_get_db.return_value = db
        client.get("/jobs")
        call_kwargs = db.list_jobs.call_args
        # Endpoint requests limit+1 from DB, so 51
        assert 51 in call_kwargs.args or call_kwargs.kwargs.get("limit") == 51
    def test_limit_over_200_rejected(self, client):
        """Limit > 200 should be rejected with 422."""
        response = client.get("/jobs?limit=201")
        assert response.status_code == 422
    @patch("SPARC.api._get_job_db")
    def test_limit_200_accepted(self, mock_get_db, client):
        """Limit of exactly 200 should be accepted."""
        db = Mock()
        db.list_jobs.return_value = []
        mock_get_db.return_value = db
        response = client.get("/jobs?limit=200")
        assert response.status_code == 200
Author	SHA1	Message	Date
agent-company	8f40109272	Add POST /export/batch endpoint for multi-company ZIP download Implements issue #1674: a new authenticated POST /export/batch endpoint that accepts a list of company names and an optional format (csv or pdf), compiles per-company exports into a ZIP archive using Python's zipfile module, and returns it as a streaming download. Key changes: - Extract `_fetch_company_rows`, `_build_company_csv`, `_build_company_pdf` helpers to eliminate duplication between the single-company endpoints and the new batch endpoint - Refactor `export_company_csv` and `export_company_pdf` to delegate to the new helpers - Add `BatchExportRequest` Pydantic model (companies list + format field) - Add `POST /export/batch` which iterates over companies, skips those with no data, writes per-company files into the ZIP, and always includes a `manifest.json` listing exported and skipped companies - Response header: `Content-Disposition: attachment; filename=sparc-export-<date>.zip` - 17 new tests covering: single company (CSV + PDF), multiple companies, all-missing, unauthenticated, invalid-token, manifest structure, input validation Closes leeworks-agents/SPARC#1674 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-19 15:21:09 +00:00
AI-Manager	313800215c	Merge pull request 'Add rate limit stats to admin panel' (#1682 ) from feature/1675-rate-limit-admin into main Merge PR #1682	2026-05-19 00:12:56 +00:00
AI-Manager	222f29deb1	Merge pull request 'Add cursor-based pagination to /analyze/batch and /jobs' (#1681 ) from feature/1669-cursor-pagination into main Merge PR #1681	2026-05-19 00:12:48 +00:00
AI-Manager	e6d95bbf57	Merge pull request 'Add stricter input validation for company names' (#1680 ) from feature/1670-company-name-validation into main Merge PR #1680	2026-05-19 00:12:42 +00:00
AI-Manager	68484ef4b1	Merge pull request 'Update ROADMAP.md: mark completed P1 and P2 items as done' (#1679 ) from feature/1678-update-roadmap into main Merge PR #1679	2026-05-19 00:12:34 +00:00
agent-company	857b3444df	Add cursor-based pagination to GET /analyze/batch and update /jobs defaults Add a new GET /analyze/batch endpoint that returns stored analysis results with cursor-based pagination (default limit 50, max 200). Also update the existing /jobs endpoint defaults from limit=10/max=100 to limit=50/max=200 for consistency. The database layer gains a list_analyses() method with cursor support using (timestamp, id) ordering, matching the existing list_jobs() pattern. Includes tests for pagination behavior, boundary limits, cursor forwarding, company name filtering, and empty result sets. Closes leeworks-agents/SPARC#1669 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-18 21:49:22 +00:00
agent-company	a95129904e	Add stricter input validation for company names on analysis endpoints Add a CompanyName validated type enforcing 2-100 character length and allowing only alphanumeric characters, spaces, hyphens, ampersands, and periods. Applied to all endpoints accepting company names: /analyze, /analyze/patent, /analyze/batch, /admin/tracked, and /export. Includes unit tests covering too-short, too-long, special character, leading-character, and valid edge cases for both single and batch endpoints. Closes leeworks-agents/SPARC#1670 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-18 21:38:44 +00:00
agent-company	7c6eed8d72	Update ROADMAP.md to mark completed P1 and P2 items as done Move seven completed items from the P1 and P2 sections into the Completed section: in-memory jobs persistence, export endpoint tests, tracked company admin tests, webhook integration tests, S3 storage tests, auto-download path tests, and scheduler DatabaseClient refactor. The P2 section now only lists the two genuinely open items: cursor-based pagination (Issue #1669) and request validation (Issue #1670). Closes leeworks-agents/SPARC#1678 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-05-18 21:29:14 +00:00