Bug 701181

Summary: Incorrect output after ghostscript processing of a PDF file
Product: Ghostscript Reporter: bruno.n.pagani
Component: PDF WriterAssignee: Ken Sharp <ken.sharp>
Status: RESOLVED FIXED    
Severity: normal CC: bruno.n.pagani
Priority: P4    
Version: 9.27   
Hardware: PC   
OS: Linux   
Customer: Word Size: ---
Attachments: Initial PDF file.
Processing of the initial file by ghostscript
LaTeX file used to embed
PDF file produced by lualatex
Ghostcript output after processing the lualatex PDF.
much simplified file
File produced by xelatex
File produced by lualatex from xelatex one

Description bruno.n.pagani 2019-06-10 14:23:46 UTC
Created attachment 17651 [details]
Initial PDF file.

Hi,

I asked about an issue with ghostscript processing of a PDF file on stackoverflow (https://stackoverflow.com/q/56502331/3845564) and it was suggested to me that I should report a bug here. I have done some additional tests in case those could be useful to pinpoint the issue.

The test case to reproduce is the following:

1. I start with https://upload.wikimedia.org/wikipedia/commons/3/31/Nucleosynthesis_periodic_table.svg and get a PDF out of it with `rsvg-convert -f pdf Nucleosynthesis_periodic_table.svg > test.pdf`. The resulting `test.pdf` is attached here.

2. Just to check, I process this file with ghostscript using `gs -o test_output.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress test.pdf`. No issue at this point, as you can see with the attached `test_output.pdf`.

3. I then embed `test.pdf` inside a LaTeX document (test_latex.tex), that I compile with either `lualatex test_latex.tex` or `pdflatex test_latex.tex` (note that using `xelatex` instead show no issue —if you thus think this is a LaTeX engine issue, please let me know). For instance, the produced `test_lualatex.pdf` file is attached.

4. I then process this new file with ghostscript: `gs -o test_output_lualatex.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress test_lualatex.pdf`. If you know compare the rendering of this latest file with the other ones, you can see that the output is quite wrong at this stage.
Comment 1 bruno.n.pagani 2019-06-10 14:24:21 UTC
Created attachment 17652 [details]
Processing of the initial file by ghostscript
Comment 2 bruno.n.pagani 2019-06-10 14:24:44 UTC
Created attachment 17653 [details]
LaTeX file used to embed
Comment 3 bruno.n.pagani 2019-06-10 14:25:39 UTC
Created attachment 17654 [details]
PDF file produced by lualatex
Comment 4 bruno.n.pagani 2019-06-10 14:26:58 UTC
Created attachment 17655 [details]
Ghostcript output after processing the lualatex PDF.
Comment 5 Ken Sharp 2019-06-11 09:04:11 UTC
Created attachment 17658 [details]
much simplified file

The bug is due to the use of patterns, massively complicated by the construction of the PDF file.

Reducing the file to something sensible we have a page, whose Content stream does nothing but run a Form XObject.

That Form XObject has a transparency Group (there was a lot of transparency in the file, none of which made any difference to the output), it then draws a rectangle, and eventually fills it with a Pattern. The Pattern content stream does nothing except run a Form Xobject.

That XObject starts by filling an area with *another* Pattern. Again that Pattern content stream does nothing but run yet another Form, which actually draws some content (finally!).

Presumably the saving in output size is the removal of all this pointless subdividing of content after interpretation.


The problem with Patterns is that they take their CTM from the 'enclosing context', not the 'current' context, that is the CTM at the start of the page/form, not the CTM at the time the pattern is drawn. If we remove the pointless Group from the first form, the problem goes away, because the pdfwrite output no longer needs to emit a Group in its own output (which it does with a Form XObject).

So fairly clearly this is a problem calculating the Matrix of the Pattern, as influenced by whether or not it is inside a Form XObject. When it isn't we get it right, when it is we get it wrong.

Its going to be a while before I get round to this one.

Attached file is a much reduced version of the original, removing the Group from object 11 outputs correctly.
Comment 6 bruno.n.pagani 2019-06-11 09:48:46 UTC
In the mean time I’ve found a workaround for my use case. I’m writing it here for the record, and also because the different PDF structure for the new files might be interesting for you (I know almost nothing about PDF internals, so I can’t tell for sure).

I now compile the LaTeX file using XeLaTeX, and get `test_xelatex.pdf` as output. As said before, processing this file with ghostscript works OK (although it increases the size a bit, from 229496 to 242562 if using `\includegraphics{test.pdf}`, from 188126 to 242662 if including `\includegraphics{test_output.pdf}` instead).

I now use `\includegraphics{test_xelatex.pdf}` and compile with LuaLaTeX again. The newly produced `test_xelatex_lualatex.pdf` can then be correctly processed by ghostscript, which kind of solves my issue.

In any case, this means that I don’t care too much if this bug is set to lowest priority, since I now have a manageable workaround.
Comment 7 bruno.n.pagani 2019-06-11 09:49:39 UTC
Created attachment 17659 [details]
File produced by xelatex
Comment 8 bruno.n.pagani 2019-06-11 09:50:20 UTC
Created attachment 17660 [details]
File produced by lualatex from xelatex one
Comment 9 bruno.n.pagani 2019-06-25 22:50:15 UTC
Some more precisions : the rendering bug seems to be PDF-reader dependent too. Apparently with Preview on macOS even the XeLaTeX + gs output is broken (all others are too).
Comment 10 Ken Sharp 2019-08-06 11:51:41 UTC
Fixed in commit ff856d0c44ce7d3f4d204f4a405857a6a6672a80

The problem basically was caused by the over-complxity of the original PDF file. The file uses transparency, but uses it in such a way that it has no actual effect, but we can't detect that in this case.

It then draws each of the element backgrounds by executing a Form Xobject, which fills with solid colours, then uses a Pattern to draw white rectangles over it. However, that appatern is defined to simply exacut another form, which then draws another pattern, and its that pattern which finally draws the white lines.

The PDF spec defines the Matrix applied to Patterns in an odd way, and the combination of patterns nested inside patterns, inside a transparency Group is what led to the problem.

The original PDF is inefficiently described, teh same effect could have been achieved much more simpy, and the resulting PDF would have been smaller and executed more quickly.

Nevertheless, it is valid, if mad, and the commit above will cater for this.
Comment 11 bruno.n.pagani 2019-08-09 14:18:17 UTC
Thanks, I can indeed confirm that the issue is fixed for me (waiting for confirmation from a MacOS user, but I expect it to be fixed there as well).

Do you think I should report an issue to the tool (rsvg-convert) that generated the original PDF regarding its unneeded complexity?

Also, I noted a small typo in the patch, in `devices/vector/gdevpdfx.h`, the comment block has an instance of “Pattterns” with 3 `t`.
Comment 12 Ken Sharp 2019-08-09 14:26:40 UTC
(In reply to bruno.n.pagani from comment #11)

> Do you think I should report an issue to the tool (rsvg-convert) that
> generated the original PDF regarding its unneeded complexity?

That's really up to you, as I say the construction is entirely valid, just very far from optimal.

Its not unusual for graphics libraries which haev been retro-fitted to write PDF files to be far from optimal in their PDF output. They often write extra forms, because their input groups objects together and a Form XObject is the closest equivalent in PDF. Its also not unusual for them to write every such Form with a transparency Group. As far as I can tell that's 'just in case'. Its easier for the producer to dump the load on the consumer than to optimise the output.

In short, I doubt the producer will change.


> Also, I noted a small typo in the patch, in `devices/vector/gdevpdfx.h`, the
> comment block has an instance of “Pattterns” with 3 `t`.

Thasnk but, as its a comment, I won't make a new commit to change it. There are a lot of typos and grammatical errors in Ghostsceript's comments already. As long as its comprehensible it'll be OK. I do try to fix any I see when making changes close by.