Add LLM-based patent classification tagging by technology domain #1692

Open
AI-Manager wants to merge 1 commits from feature/patent-classification-tags into main
Owner

Summary

  • Add classify_patent_tags() method to LLMAnalyzer that sends patent content to the LLM with a classification prompt, returning canonical tags: ai, semiconductors, materials, biotech, networking, other
  • Add patent_tags TEXT[] column to the patents table with a GIN index for efficient array queries
  • Run classification automatically as part of the analysis pipeline after patent processing; tags are persisted via update_patent_tags() and cached to avoid re-classification
  • Include tags in CompanyAnalysisResponse for both individual and batch API results
  • Add ?tags=ai,semiconductors filter parameter to GET /analyze/batch endpoint
  • Add GET /analytics/tags endpoint returning tag distribution data
  • Add tag filter controls and a technology domain distribution bar chart to the Analytics page
  • Add 12 unit tests covering classification prompt calls (mocked LLM), tag validation/filtering, error handling, DB persistence, and caching

Closes leeworks-agents/SPARC#1672

Test Plan

  • All 12 new tests pass
  • All existing tests pass (3 pre-existing failures unrelated to this change)
  • Manual: verify tags appear in /analyze/{company} response
  • Manual: verify ?tags=ai filter on GET /analyze/batch
  • Manual: verify tag distribution chart on Analytics page
## Summary - Add `classify_patent_tags()` method to `LLMAnalyzer` that sends patent content to the LLM with a classification prompt, returning canonical tags: `ai`, `semiconductors`, `materials`, `biotech`, `networking`, `other` - Add `patent_tags TEXT[]` column to the `patents` table with a GIN index for efficient array queries - Run classification automatically as part of the analysis pipeline after patent processing; tags are persisted via `update_patent_tags()` and cached to avoid re-classification - Include tags in `CompanyAnalysisResponse` for both individual and batch API results - Add `?tags=ai,semiconductors` filter parameter to `GET /analyze/batch` endpoint - Add `GET /analytics/tags` endpoint returning tag distribution data - Add tag filter controls and a technology domain distribution bar chart to the Analytics page - Add 12 unit tests covering classification prompt calls (mocked LLM), tag validation/filtering, error handling, DB persistence, and caching Closes leeworks-agents/SPARC#1672 ## Test Plan - [x] All 12 new tests pass - [x] All existing tests pass (3 pre-existing failures unrelated to this change) - [ ] Manual: verify tags appear in `/analyze/{company}` response - [ ] Manual: verify `?tags=ai` filter on `GET /analyze/batch` - [ ] Manual: verify tag distribution chart on Analytics page
AI-Manager added 1 commit 2026-05-19 15:34:57 +00:00
- Add classify_patent_tags() to LLMAnalyzer with canonical tag list
  (ai, semiconductors, materials, biotech, networking, other)
- Add patent_tags TEXT[] column to patents table with GIN index
- Run classification automatically in the analysis pipeline after
  patent processing; persist tags via update_patent_tags()
- Include tags in CompanyAnalysisResult and API response models
- Add ?tags= filter to GET /analyze/batch endpoint
- Add GET /analytics/tags endpoint for tag distribution data
- Add tag filter controls and distribution chart to Analytics page
- Add 12 unit tests covering classification, DB storage, and caching

Closes leeworks-agents/SPARC#1672

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This pull request can be merged automatically.
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin feature/patent-classification-tags:feature/patent-classification-tags
git checkout feature/patent-classification-tags
Sign in to join this conversation.