forked from 0xWheatyz/SPARC
Fix analyze_single_patent to download PDF before reading from disk, or document the prerequisite #745
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Roadmap reference: P2 - analyze_single_patent assumes local file path
analyze_single_patentconstructs the pathpatents/{patent_id}.pdfand reads from disk, but it does not download the PDF first. Calling this method on a patent that has not been previously fetched will silently fail or raise a file-not-found error.What to do
Option A (preferred): Integrate the PDF download step directly into
analyze_single_patentbefore attempting to read the file. If the file already exists locally, skip the download.Option B: Add a clear
FileNotFoundErrorwith an explanatory message and update the docstring to document that the patent PDF must be downloaded first.Acceptance criteria
analyze_single_patenton a patent with no cached PDF either downloads it first (Option A) or raises a descriptive error (Option B)Resolved.
analyze_single_patentinanalyzer.pychecks if the PDF exists on disk, and if not, attempts to download it from a cached link. Raises a descriptiveFileNotFoundErrorif no link is cached. Docstring documents the behavior.