Bug 707843 - Text ignored in extraction
Summary: Text ignored in extraction
Status: UNCONFIRMED
Alias: None
Product: MuPDF
Classification: Unclassified
Component: mupdf (show other bugs)
Version: 1.24.4
Hardware: All All
: P2 major
Assignee: MuPDF bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2024-06-26 08:49 UTC by Jorj
Modified: 2024-06-26 08:49 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jorj 2024-06-26 08:49:18 UTC
PyMuPDF issue: https://github.com/pymupdf/PyMuPDF/issues/3620

File link: https://github.com/user-attachments/files/15982045/mscbookin.pdf

Reproducer:

mutool draw -o mscbookin.txt mscbookin.pdf


Observations:

While MuPDF 1.23.11 extracts the complete text as expected, MuPDF 1.24.4 ignores major parts (also generates differences).

For convenience, here are the links to the respective text files:
https://github.com/user-attachments/files/15985103/mutool-12311.txt
https://github.com/user-attachments/files/15985105/mutool-12404.txt