703274 – Whiteout not masking as it does in Acrobat (old regression)

Bug 703274 - Whiteout not masking as it does in Acrobat (old regression)

Summary: Whiteout not masking as it does in Acrobat (old regression)

Status:	RESOLVED INVALID

Alias:	None

Product:	MuPDF
Classification:	Unclassified
Component:	mupdf (show other bugs)
Version:	1.18.0
Hardware:	PC Windows 10

Importance:	P4 normal
Assignee:	MuPDF bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2020-12-15 01:37 UTC by spambin
Modified:	2021-02-25 16:20 UTC (History)
CC List:	1 user (show)

See Also:
Customer:
Word Size:	---

Attachments
table with whiteouts (25.29 KB, application/pdf) 2020-12-15 01:37 UTC, spambin	Details
todo2.pdf (3.77 KB, application/pdf) 2021-02-25 15:09 UTC, Robin Watts	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description spambin 2020-12-15 01:37:33 UTC

Created attachment 20370 [details]
table with whiteouts

There are two issues with the attached file
one is how Hebrew appears as different visible characters. However, that is not the reason for this bug report, which is to show that MuPDF does not apply a white cover over it as acrobat or other renderers did/do.

This is  a regression from MuPDF version 1.9 and earlier behaviour which DID mask the bad lettering.

for visible differences see 
https://github.com/sumatrapdfreader/sumatrapdf/issues/1820

Comment 1 Robin Watts 2021-02-25 15:09:43 UTC

Created attachment 20664 [details]
todo2.pdf

Simplified version

Comment 2 Robin Watts 2021-02-25 15:11:56 UTC

There is no whiteout over the "bad lettering".

This simplified version simply displays 10 chars in a font.

Acrobat and Ghostscript display nothing. MuPDF shows the chars.

Ghostscript does note that it's using a substitute font.

Comment 3 Tor Andersson 2021-02-25 15:19:59 UTC

I also cannot find anything in this file that drawns a white cover over anything.

The content stream draws the text second-to-last followed by a black stroked line. There's nothing following the text that would indicate a white cover.

Acrobat and GS draw nothing, because there's garbage encoding information and the font is not embedded. MuPDF draws the garbage encoded text anyway. Both MuPDF and Acrobat can copy the Hebrew text because there is a ToUnicode mapping, but that is only used for copying the text, not for rendering.

Comment 4 Tor Andersson 2021-02-25 16:20:47 UTC

The text appearing different in all the viewers is down to how they all handle broken cases with insufficient encoding information differently. In GS and Acrobat the text disappears, while MuPDF tries to show something based on falling back to a WinANSI encoding if the named glyphs are not present in the font.

If you want to show Hebrew text in a PDF, you have to embed the fonts.