Bug 707843

Summary: Text ignored in extraction
Product: MuPDF Reporter: Jorj <jorj.x.mckie>
Component: mupdfAssignee: MuPDF bugs <mupdf-bugs>
Status: UNCONFIRMED ---    
Severity: major    
Priority: P2    
Version: 1.24.4   
Hardware: All   
OS: All   
Customer: Word Size: ---

Description Jorj 2024-06-26 08:49:18 UTC
PyMuPDF issue: https://github.com/pymupdf/PyMuPDF/issues/3620

File link: https://github.com/user-attachments/files/15982045/mscbookin.pdf

Reproducer:

mutool draw -o mscbookin.txt mscbookin.pdf


Observations:

While MuPDF 1.23.11 extracts the complete text as expected, MuPDF 1.24.4 ignores major parts (also generates differences).

For convenience, here are the links to the respective text files:
https://github.com/user-attachments/files/15985103/mutool-12311.txt
https://github.com/user-attachments/files/15985105/mutool-12404.txt