forked from 0xWheatyz/SPARC
Fix analyze_single_patent to download PDF before reading from disk #860
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Roadmap item: P2 - Backend - analyze_single_patent assumes local file path
analyze_single_patentconstructs a pathpatents/{patent_id}.pdfand reads it from disk, but does not download the PDF first. Calling this method on a patent whose PDF has not been pre-fetched results in a file-not-found error with no helpful message.Work to do
analyze_single_patent, check whetherpatents/{patent_id}.pdfexists before attempting to read it.Acceptance criteria
analyze_single_patentfor a patent whose PDF is not cached triggers a download automatically.FileNotFoundError).Resolved in codebase. SPARC/analyzer.py analyze_single_patent() (lines 109-164) now checks if the PDF exists on disk, and if not, looks up the PDF link from the database cache and downloads it automatically before parsing. Closing as implemented.