Bug 706498

Summary: CMaps are not properly mapped
Product: MuPDF Reporter: mrvauxs
Component: mupdfAssignee: MuPDF bugs <mupdf-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: avorobey, robin.watts
Priority: P4    
Version: unspecified   
Hardware: PC   
OS: Windows 10   
URL: https://github.com/sumatrapdfreader/sumatrapdf/issues/3307
Customer: Word Size: ---
Attachments: example pdf file
proposed fix

Description mrvauxs 2023-03-23 14:54:53 UTC
PDFs can contain special characters that in browsers like Chrome, when copy-pasted return a special string. These characters do not work properly in MuPDF, according to SumatraPDF discussions.

More information can be found here: https://github.com/sumatrapdfreader/sumatrapdf/issues/3307
Comment 1 Anatoly Vorobey 2023-03-23 15:28:01 UTC
Created attachment 23889 [details]
example pdf file

This file exhibits the bug. Five character codes are mapped into longish strings like "[free-action]" in the PDF's CMaps section. MuPDF doesn't use them when extracting the text.
Comment 2 Anatoly Vorobey 2023-03-23 15:36:15 UTC
Created attachment 23891 [details]
proposed fix

Increasing the length limit from 8 to 32 seems benign. At the same time, several local variables were using a hardcoded 8 limit instead of PDF_MRANGE_CAP, which is the longest value they need to be able to store; the patch fixes that, too.
Comment 3 Robin Watts 2023-03-23 17:59:38 UTC
Fixed in:

commit 9903ec28de5124df8b6f84c4c21d838c1876fc20
Author: Robin Watts <Robin.Watts@artifex.com>
Date:   Thu Mar 23 16:12:24 2023 +0000

    Bug 706498: Increase maximum number of chars in an MRange CMAP entry.

    This is to cope with PDF files that map single font chars to
    long strings, like "[free-action]".

    Someone will undoutably complain that 32 is not large enough at some
    point in future...

    Thanks to Anatoly Vorobey for the report and patch.

Thanks again!