PDFs can contain special characters that in browsers like Chrome, when copy-pasted return a special string. These characters do not work properly in MuPDF, according to SumatraPDF discussions. More information can be found here: https://github.com/sumatrapdfreader/sumatrapdf/issues/3307
Created attachment 23889 [details] example pdf file This file exhibits the bug. Five character codes are mapped into longish strings like "[free-action]" in the PDF's CMaps section. MuPDF doesn't use them when extracting the text.
Created attachment 23891 [details] proposed fix Increasing the length limit from 8 to 32 seems benign. At the same time, several local variables were using a hardcoded 8 limit instead of PDF_MRANGE_CAP, which is the longest value they need to be able to store; the patch fixes that, too.
Fixed in: commit 9903ec28de5124df8b6f84c4c21d838c1876fc20 Author: Robin Watts <Robin.Watts@artifex.com> Date: Thu Mar 23 16:12:24 2023 +0000 Bug 706498: Increase maximum number of chars in an MRange CMAP entry. This is to cope with PDF files that map single font chars to long strings, like "[free-action]". Someone will undoutably complain that 32 is not large enough at some point in future... Thanks to Anatoly Vorobey for the report and patch. Thanks again!