Summary: | CMaps are not properly mapped | ||
---|---|---|---|
Product: | MuPDF | Reporter: | mrvauxs |
Component: | mupdf | Assignee: | MuPDF bugs <mupdf-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | avorobey, robin.watts |
Priority: | P4 | ||
Version: | unspecified | ||
Hardware: | PC | ||
OS: | Windows 10 | ||
URL: | https://github.com/sumatrapdfreader/sumatrapdf/issues/3307 | ||
Customer: | Word Size: | --- | |
Attachments: |
example pdf file
proposed fix |
Description
mrvauxs
2023-03-23 14:54:53 UTC
Created attachment 23889 [details]
example pdf file
This file exhibits the bug. Five character codes are mapped into longish strings like "[free-action]" in the PDF's CMaps section. MuPDF doesn't use them when extracting the text.
Created attachment 23891 [details]
proposed fix
Increasing the length limit from 8 to 32 seems benign. At the same time, several local variables were using a hardcoded 8 limit instead of PDF_MRANGE_CAP, which is the longest value they need to be able to store; the patch fixes that, too.
Fixed in: commit 9903ec28de5124df8b6f84c4c21d838c1876fc20 Author: Robin Watts <Robin.Watts@artifex.com> Date: Thu Mar 23 16:12:24 2023 +0000 Bug 706498: Increase maximum number of chars in an MRange CMAP entry. This is to cope with PDF files that map single font chars to long strings, like "[free-action]". Someone will undoutably complain that 32 is not large enough at some point in future... Thanks to Anatoly Vorobey for the report and patch. Thanks again! |