Summary: | Mutool clean with page selection yields text not extractable | ||
---|---|---|---|
Product: | MuPDF | Reporter: | Jorj <jorj.x.mckie> |
Component: | mupdf | Assignee: | MuPDF bugs <mupdf-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | major | CC: | robin.watts |
Priority: | P2 | ||
Version: | unspecified | ||
Hardware: | All | ||
OS: | All | ||
Customer: | Word Size: | --- |
Description
Jorj
2024-07-19 14:31:33 UTC
Fixed with: commit cbe65e8144782a684e1fec56e5dd3dd26beaf65b (golden/master) Author: Robin Watts <Robin.Watts@artifex.com> Date: Fri Jul 19 17:41:22 2024 +0100 Bug 707890: Carry over structparent information when cleaning. We were completely omitting the structure tree when copying. This meant that information like "ActualText" was missing, resulting in problems when doing text extraction. Here we copy the entirety of the Structure Tree across, and regenerate the ParentTree so that the Page StructParents still point to the right thing. We do NOT cut the actual Structure Tree down, so the file remains larger than it maybe needs to be - but it is at least correct now. |