Bug 703274

Summary:	Whiteout not masking as it does in Acrobat (old regression)
Product:	MuPDF	Reporter:	spambin
Component:	mupdf	Assignee:	MuPDF bugs <mupdf-bugs>
Status:	RESOLVED INVALID
Severity:	normal	CC:	robin.watts
Priority:	P4
Version:	1.18.0
Hardware:	PC
OS:	Windows 10
Customer:		Word Size:	---
Attachments:	table with whiteouts todo2.pdf

Description spambin 2020-12-15 01:37:33 UTC

Created attachment 20370 [details]
table with whiteouts

There are two issues with the attached file
one is how Hebrew appears as different visible characters. However, that is not the reason for this bug report, which is to show that MuPDF does not apply a white cover over it as acrobat or other renderers did/do.

This is  a regression from MuPDF version 1.9 and earlier behaviour which DID mask the bad lettering.

for visible differences see 
https://github.com/sumatrapdfreader/sumatrapdf/issues/1820

Comment 1 Robin Watts 2021-02-25 15:09:43 UTC

Created attachment 20664 [details]
todo2.pdf

Simplified version

Comment 2 Robin Watts 2021-02-25 15:11:56 UTC

There is no whiteout over the "bad lettering".

This simplified version simply displays 10 chars in a font.

Acrobat and Ghostscript display nothing. MuPDF shows the chars.

Ghostscript does note that it's using a substitute font.

Comment 3 Tor Andersson 2021-02-25 15:19:59 UTC

I also cannot find anything in this file that drawns a white cover over anything.

The content stream draws the text second-to-last followed by a black stroked line. There's nothing following the text that would indicate a white cover.

Acrobat and GS draw nothing, because there's garbage encoding information and the font is not embedded. MuPDF draws the garbage encoded text anyway. Both MuPDF and Acrobat can copy the Hebrew text because there is a ToUnicode mapping, but that is only used for copying the text, not for rendering.

Comment 4 Tor Andersson 2021-02-25 16:20:47 UTC

The text appearing different in all the viewers is down to how they all handle broken cases with insufficient encoding information differently. In GS and Acrobat the text disappears, while MuPDF tries to show something based on falling back to a WinANSI encoding if the named glyphs are not present in the font.

If you want to show Hebrew text in a PDF, you have to embed the fonts.