forked from 0xWheatyz/SPARC
Fix analyze_single_patent to download PDF before attempting to read it #1411
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Roadmap item: P2 -- Backend -- analyze_single_patent assumes local file path
analyze_single_patentconstructs a pathpatents/{patent_id}.pdfand reads from disk without first ensuring the file exists. If the patent has not been previously downloaded, the method fails with a misleading file-not-found error.What to do
Choose one of the following approaches and implement it:
Option A (preferred): Integrate the PDF download step at the start of
analyze_single_patent. If the file already exists, skip the download.Option B: Raise a clear, descriptive exception (e.g.,
PatentPDFNotFoundError) with a message explaining that the patent must be downloaded first, and document the prerequisite in the docstring.Acceptance criteria
analyze_single_patenton a patent whose PDF is not on disk either downloads it automatically (Option A) or raises a descriptive error (Option B).Triage: Already resolved in main.
analyze_single_patent()inSPARC/analyzer.py(lines 109-158) already checks if the PDF exists on disk, looks up the cached download link from the database, and callsSERP.save_patents()to download the PDF before reading it. Clear error message when no cached link exists. Closing as complete.