Bug 706498 - CMaps are not properly mapped
Summary: CMaps are not properly mapped
Status: RESOLVED FIXED
Alias: None
Product: MuPDF
Classification: Unclassified
Component: mupdf (show other bugs)
Version: unspecified
Hardware: PC Windows 10
: P4 normal
Assignee: MuPDF bugs
URL: https://github.com/sumatrapdfreader/s...
Keywords:
Depends on:
Blocks:
 
Reported: 2023-03-23 14:54 UTC by mrvauxs
Modified: 2023-03-23 17:59 UTC (History)
2 users (show)

See Also:
Customer:
Word Size: ---


Attachments
example pdf file (841.95 KB, application/pdf)
2023-03-23 15:28 UTC, Anatoly Vorobey
Details
proposed fix (1.59 KB, patch)
2023-03-23 15:36 UTC, Anatoly Vorobey
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description mrvauxs 2023-03-23 14:54:53 UTC
PDFs can contain special characters that in browsers like Chrome, when copy-pasted return a special string. These characters do not work properly in MuPDF, according to SumatraPDF discussions.

More information can be found here: https://github.com/sumatrapdfreader/sumatrapdf/issues/3307
Comment 1 Anatoly Vorobey 2023-03-23 15:28:01 UTC
Created attachment 23889 [details]
example pdf file

This file exhibits the bug. Five character codes are mapped into longish strings like "[free-action]" in the PDF's CMaps section. MuPDF doesn't use them when extracting the text.
Comment 2 Anatoly Vorobey 2023-03-23 15:36:15 UTC
Created attachment 23891 [details]
proposed fix

Increasing the length limit from 8 to 32 seems benign. At the same time, several local variables were using a hardcoded 8 limit instead of PDF_MRANGE_CAP, which is the longest value they need to be able to store; the patch fixes that, too.
Comment 3 Robin Watts 2023-03-23 17:59:38 UTC
Fixed in:

commit 9903ec28de5124df8b6f84c4c21d838c1876fc20
Author: Robin Watts <Robin.Watts@artifex.com>
Date:   Thu Mar 23 16:12:24 2023 +0000

    Bug 706498: Increase maximum number of chars in an MRange CMAP entry.

    This is to cope with PDF files that map single font chars to
    long strings, like "[free-action]".

    Someone will undoutably complain that 32 is not large enough at some
    point in future...

    Thanks to Anatoly Vorobey for the report and patch.

Thanks again!