Bug 699776

Summary: -sDEVICE=pdfwrite creates gray or black rectangles which obscure text.
Product: Ghostscript Reporter: MarjaE <marja-e>
Component: PDF WriterAssignee: Ken Sharp <ken.sharp>
Status: RESOLVED INVALID    
Severity: normal    
Priority: P4    
Version: 9.25   
Hardware: Macintosh   
OS: MacOS X   
Customer: Word Size: ---
Attachments: One affected file
Same affected file after running through pdfwrite

Description MarjaE 2018-09-15 20:59:33 UTC
I often use Ghostscript (in Automator) to try to pre-process vector pdfs (so they're faster on my Mac, and compatible with my Kindle).

Even just running -sDEVICE=pdfwrite without the rest of the pre-processing often creates gray or black rectangles, which sometimes obscure text.

For a mild example, it adds a gray box to the title of the rules here:

https://www.peginc.com/store/genre-supplement-wizards-warriors/

Adding -dFILTERIMAGE does *not* remove the rectangles, and sometimes removes good text.
Comment 1 MarjaE 2018-09-16 00:21:10 UTC
Adding -dFILTERVECTOR *does* remove the rectangles. But it is just a kludge.
Comment 2 Ken Sharp 2018-09-16 07:35:54 UTC
(In reply to MarjaE from comment #0)
> I often use Ghostscript (in Automator) to try to pre-process vector pdfs (so
> they're faster on my Mac, and compatible with my Kindle).
> 
> Even just running -sDEVICE=pdfwrite without the rest of the pre-processing
> often creates gray or black rectangles, which sometimes obscure text.
> 
> For a mild example, it adds a gray box to the title of the rules here:
> 
> https://www.peginc.com/store/genre-supplement-wizards-warriors/

Please attach example files here. URLs often go stale before anyone has time to look into the problem.
Comment 3 Ken Sharp 2018-09-16 14:39:36 UTC
The file in the URL is a Zip archive, which contains 3 more zip archives. After extracting all of these I end up with 7 PDF files, its not clear to me which of these are the 'rules' you refer to.

One of them isn't even a PDF file.... span.pdf contains HTML.

Pretty much all of these 'vector' PDF files contain images, several of them contain JPEG2000 images.

In the absence of a full command line, I've run all of them through Ghostscript with a simple command line:

gs -sDEVICE=pdfwrite -o out.pdf

I do not see any problems with any of the (numerous) pages in any of the PDF files. Of course, in a large number of pages spread across several files, its easy to miss problems.

You need to supply a file and a command line which will reproduce the problem, you also need to say how you are opening the PDF file; have you considered the possibility that the problem is with the PDF viewer you are using ?
Comment 4 MarjaE 2018-09-16 18:07:03 UTC
Created attachment 15634 [details]
One affected file
Comment 5 MarjaE 2018-09-16 18:28:46 UTC
I run from automator using the following:

Run Shell Script

Shell: /bin/bash

Pass input: as arguments

for f in "$@"
do
	suffix="-converted.pdf"
	base=`basename "$f" .pdf`
	outputfile=$base$suffix
	/usr/local/bin/gs -sDEVICE=pdfwrite -sstdout=%sstderr -dNOPAUSE -dQUIET -dBATCH -sOutputFile="$outputfile" "$f"
done

***

I get the same results running from terminal using the following:

gs -dNOPAUSE -dQUIET -dBATCH -sDEVICE=pdfwrite -o test.pdf [input file address]

***

I see the same gray boxes in the Finder preview window, in Clearview, and in Preview. In the sample file, I get a gray box around the title Wizards & Warriors on pages 1 and 8.
Comment 6 Ken Sharp 2018-09-16 18:47:49 UTC
(In reply to MarjaE from comment #5)

> I see the same gray boxes in the Finder preview window, in Clearview, and in
> Preview. In the sample file, I get a gray box around the title Wizards &
> Warriors on pages 1 and 8.

I assume you mean pages 1 and 6, since page 8 doesn't have this on it (whereas page 6 does)

I don't see any problem with Adobe Acrobat Pro, Ghostscript, MuPDF, Google Chrome or another PDF rendering engine I have to hand.

I have no idea what Clearview is, but the others are part of MacOS and use the Quartz rendering engine. If there's a bug in that, then they will (clearly) show the same result. Quite likely any MacOS application will, because they will use the system libraries to render the PDF and. if those libraries have a problem, then all the applications using those libraries will display the file incorrectly.

Try opening the file in Adobe Acrobat Reader, my suspicion is that this is a transparency bug in the rendering engine.
Comment 7 MarjaE 2018-09-16 20:17:08 UTC
I don't have Adobe Acrobat Reader. I can't download from their website, since I'm photosensitive, and their site is prone to reload and flash every 1-2 seconds.
Comment 8 MarjaE 2018-09-16 20:24:40 UTC
Created attachment 15636 [details]
Same affected file after running through pdfwrite
Comment 9 MarjaE 2018-09-16 20:36:33 UTC
Kindle Dx cannot show images in original, has 3 gray boxes on each of pages 1 and 6. Can show images in processed version, still has 3 gray boxes.
Comment 10 Ken Sharp 2018-09-16 21:05:50 UTC
(In reply to MarjaE from comment #8)
> Created attachment 15636 [details]
> Same affected file after running through pdfwrite

This renders perfectly in every PDF consumer I have. The only Mac I have access to is running 10.8 (Mountain Lion). On this Preview shows a (very faint) rectangle behind the title, and an equally faint rectangle behind the image of the characters. I don't see this on any other PDF consumer.

I suspect you have found a bug in Quartz and should report it to Apple.
Comment 11 MarjaE 2018-10-17 04:10:33 UTC
I get the same gray box on my Kindle, which doesn't use Quartz.