Bug 703274

Summary: Whiteout not masking as it does in Acrobat (old regression)
Product: MuPDF Reporter: spambin
Component: mupdfAssignee: MuPDF bugs <mupdf-bugs>
Status: RESOLVED INVALID    
Severity: normal CC: robin.watts
Priority: P4    
Version: 1.18.0   
Hardware: PC   
OS: Windows 10   
Customer: Word Size: ---
Attachments: table with whiteouts
todo2.pdf

Description spambin 2020-12-15 01:37:33 UTC
Created attachment 20370 [details]
table with whiteouts

There are two issues with the attached file
one is how Hebrew appears as different visible characters. However, that is not the reason for this bug report, which is to show that MuPDF does not apply a white cover over it as acrobat or other renderers did/do.

This is  a regression from MuPDF version 1.9 and earlier behaviour which DID mask the bad lettering.

for visible differences see 
https://github.com/sumatrapdfreader/sumatrapdf/issues/1820
Comment 1 Robin Watts 2021-02-25 15:09:43 UTC
Created attachment 20664 [details]
todo2.pdf

Simplified version
Comment 2 Robin Watts 2021-02-25 15:11:56 UTC
There is no whiteout over the "bad lettering".

This simplified version simply displays 10 chars in a font.

Acrobat and Ghostscript display nothing. MuPDF shows the chars.

Ghostscript does note that it's using a substitute font.
Comment 3 Tor Andersson 2021-02-25 15:19:59 UTC
I also cannot find anything in this file that drawns a white cover over anything.

The content stream draws the text second-to-last followed by a black stroked line. There's nothing following the text that would indicate a white cover.

Acrobat and GS draw nothing, because there's garbage encoding information and the font is not embedded. MuPDF draws the garbage encoded text anyway. Both MuPDF and Acrobat can copy the Hebrew text because there is a ToUnicode mapping, but that is only used for copying the text, not for rendering.
Comment 4 Tor Andersson 2021-02-25 16:20:47 UTC
The text appearing different in all the viewers is down to how they all handle broken cases with insufficient encoding information differently. In GS and Acrobat the text disappears, while MuPDF tries to show something based on falling back to a WinANSI encoding if the named glyphs are not present in the font.

If you want to show Hebrew text in a PDF, you have to embed the fonts.