Summary: | Incorrect output after ghostscript processing of a PDF file | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | bruno.n.pagani |
Component: | PDF Writer | Assignee: | Ken Sharp <ken.sharp> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | bruno.n.pagani |
Priority: | P4 | ||
Version: | 9.27 | ||
Hardware: | PC | ||
OS: | Linux | ||
Customer: | Word Size: | --- | |
Attachments: |
Initial PDF file.
Processing of the initial file by ghostscript LaTeX file used to embed PDF file produced by lualatex Ghostcript output after processing the lualatex PDF. much simplified file File produced by xelatex File produced by lualatex from xelatex one |
Description
bruno.n.pagani
2019-06-10 14:23:46 UTC
Created attachment 17652 [details]
Processing of the initial file by ghostscript
Created attachment 17653 [details]
LaTeX file used to embed
Created attachment 17654 [details]
PDF file produced by lualatex
Created attachment 17655 [details]
Ghostcript output after processing the lualatex PDF.
Created attachment 17658 [details]
much simplified file
The bug is due to the use of patterns, massively complicated by the construction of the PDF file.
Reducing the file to something sensible we have a page, whose Content stream does nothing but run a Form XObject.
That Form XObject has a transparency Group (there was a lot of transparency in the file, none of which made any difference to the output), it then draws a rectangle, and eventually fills it with a Pattern. The Pattern content stream does nothing except run a Form Xobject.
That XObject starts by filling an area with *another* Pattern. Again that Pattern content stream does nothing but run yet another Form, which actually draws some content (finally!).
Presumably the saving in output size is the removal of all this pointless subdividing of content after interpretation.
The problem with Patterns is that they take their CTM from the 'enclosing context', not the 'current' context, that is the CTM at the start of the page/form, not the CTM at the time the pattern is drawn. If we remove the pointless Group from the first form, the problem goes away, because the pdfwrite output no longer needs to emit a Group in its own output (which it does with a Form XObject).
So fairly clearly this is a problem calculating the Matrix of the Pattern, as influenced by whether or not it is inside a Form XObject. When it isn't we get it right, when it is we get it wrong.
Its going to be a while before I get round to this one.
Attached file is a much reduced version of the original, removing the Group from object 11 outputs correctly.
In the mean time I’ve found a workaround for my use case. I’m writing it here for the record, and also because the different PDF structure for the new files might be interesting for you (I know almost nothing about PDF internals, so I can’t tell for sure). I now compile the LaTeX file using XeLaTeX, and get `test_xelatex.pdf` as output. As said before, processing this file with ghostscript works OK (although it increases the size a bit, from 229496 to 242562 if using `\includegraphics{test.pdf}`, from 188126 to 242662 if including `\includegraphics{test_output.pdf}` instead). I now use `\includegraphics{test_xelatex.pdf}` and compile with LuaLaTeX again. The newly produced `test_xelatex_lualatex.pdf` can then be correctly processed by ghostscript, which kind of solves my issue. In any case, this means that I don’t care too much if this bug is set to lowest priority, since I now have a manageable workaround. Created attachment 17659 [details]
File produced by xelatex
Created attachment 17660 [details]
File produced by lualatex from xelatex one
Some more precisions : the rendering bug seems to be PDF-reader dependent too. Apparently with Preview on macOS even the XeLaTeX + gs output is broken (all others are too). Fixed in commit ff856d0c44ce7d3f4d204f4a405857a6a6672a80 The problem basically was caused by the over-complxity of the original PDF file. The file uses transparency, but uses it in such a way that it has no actual effect, but we can't detect that in this case. It then draws each of the element backgrounds by executing a Form Xobject, which fills with solid colours, then uses a Pattern to draw white rectangles over it. However, that appatern is defined to simply exacut another form, which then draws another pattern, and its that pattern which finally draws the white lines. The PDF spec defines the Matrix applied to Patterns in an odd way, and the combination of patterns nested inside patterns, inside a transparency Group is what led to the problem. The original PDF is inefficiently described, teh same effect could have been achieved much more simpy, and the resulting PDF would have been smaller and executed more quickly. Nevertheless, it is valid, if mad, and the commit above will cater for this. Thanks, I can indeed confirm that the issue is fixed for me (waiting for confirmation from a MacOS user, but I expect it to be fixed there as well). Do you think I should report an issue to the tool (rsvg-convert) that generated the original PDF regarding its unneeded complexity? Also, I noted a small typo in the patch, in `devices/vector/gdevpdfx.h`, the comment block has an instance of “Pattterns” with 3 `t`. (In reply to bruno.n.pagani from comment #11) > Do you think I should report an issue to the tool (rsvg-convert) that > generated the original PDF regarding its unneeded complexity? That's really up to you, as I say the construction is entirely valid, just very far from optimal. Its not unusual for graphics libraries which haev been retro-fitted to write PDF files to be far from optimal in their PDF output. They often write extra forms, because their input groups objects together and a Form XObject is the closest equivalent in PDF. Its also not unusual for them to write every such Form with a transparency Group. As far as I can tell that's 'just in case'. Its easier for the producer to dump the load on the consumer than to optimise the output. In short, I doubt the producer will change. > Also, I noted a small typo in the patch, in `devices/vector/gdevpdfx.h`, the > comment block has an instance of “Pattterns” with 3 `t`. Thasnk but, as its a comment, I won't make a new commit to change it. There are a lot of typos and grammatical errors in Ghostsceript's comments already. As long as its comprehensible it'll be OK. I do try to fix any I see when making changes close by. |