Summary: | pdfwrite produces an invalid pdf file | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | William Bader <williambader> |
Component: | PDF Writer | Assignee: | Ken Sharp <ken.sharp> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | williambader |
Priority: | P4 | ||
Version: | 9.52 | ||
Hardware: | PC | ||
OS: | Linux | ||
Customer: | Word Size: | --- | |
Attachments: |
the original PDF 4244201-1.pdf
pdftops -level3 output x3.ps bad x3.pdf from /u/gnu/gs9.52/gs -sDEVICE=pdfwrite -o x3.pdf x3.ps viewing x3.pdf in gv shows the first screen, then pauses, then the second minimised test file (one glyph) |
Description
William Bader
2020-07-11 05:49:48 UTC
Created attachment 19436 [details]
pdftops -level3 output x3.ps
Created attachment 19437 [details]
bad x3.pdf from /u/gnu/gs9.52/gs -sDEVICE=pdfwrite -o x3.pdf x3.ps
Created attachment 19438 [details]
viewing x3.pdf in gv shows the first screen, then pauses, then the second
Comparing the ps produced by pdftops -level2 (which gs handles correctly) and -level3: images L2 /LZWDecode filter, L3 /FlateDecode filter fonts L2 /pdfMakeFont { 4 3 roll findfont 4 2 roll matrix scale makefont dup length dict begin { 1 index /FID ne { def } { pop pop } ifelse } forall /Encoding exch def currentdict end definefont pop } def L3 /pdfMakeFont16L3 { 1 index /CIDFont resourcestatus { pop pop 1 index /CIDFont findresource /CIDFontType known } { false } ifelse { 0 eq { /Identity-H } { /Identity-V } ifelse exch 1 array astore composefont pop } { pdfMakeFont16 } ifelse } def The L3 ps produced for 4244201-1.pdf has an image with /FlateDecode in /DeviceCMYK. I have another test PDF where pdftops -level3 produces an image with /FlateDecode in /DeviceRGB, and that works OK. I suspect that fonts are not the problem because other people would have noticed by now, and a font problem would probably cause a font error instead of most of the image being overwritten with black. My guess is that it has something to do with a /FlateDecode in /DeviceCMYK. I built poppler-0.90.1 with cmake -DENABLE_ZLIB=0 (to eliminate the use of FlateDecode). Even though the ps from poppler pdftops -level3 became 27% larger, the pdf from gs pdfwrite remained the same size (with 74 bytes different) and still displayed incorrectly, so the problem is something other than using FlateDecode on images. Another difference is that the pdftops -level3 output adds lines like false opm where its prolog sets /opm { dup /pdfOPM exch def /setoverprintmode where{pop setoverprintmode}{pop}ifelse } def but replacing it with /opm { pop } def doesn't fix the problem with gs pdfwrite. (In reply to William Bader from comment #0) > I have a PDF that when I convert it to ps with poppler pdftops and then > convert the resulting ps back to PDF with gs, gs produces an invalid PDF > that displays incorrectly in gs and that crashes atril. The PDF is not invalid. Its not correct but its perfectly valid. I can't comment on atril, presumably it has a bug. (In reply to William Bader from comment #4) > The L3 ps produced for 4244201-1.pdf has an image with /FlateDecode in > /DeviceCMYK. > I have another test PDF where pdftops -level3 produces an image with > /FlateDecode in /DeviceRGB, and that works OK. Nothing to do with the problem. > I suspect that fonts are not the problem because other people would have > noticed by now, and a font problem would probably cause a font error instead > of most of the image being overwritten with black. > My guess is that it has something to do with a /FlateDecode in /DeviceCMYK. No, its very clearly the fonts. If you set level 2 output (and if that means baseline level 2 output) then CIDFonts are not supported; these were added in, I think, version 2016 of the Adobe interpreter (2000 indicates level 2, 3000 indicates level 3). So I imagine that's why it works if you output level 2 PostScript instead of level 3, there will be no CIDFonts. There are up to 5 Font Matrix entries possible here, the type 0 CID-Keyed instance of the font, the CIDFont which is used by the type 0 font, each of the descendant fonts of the CIDFont and then, because these are CFF CIDFonts, the CFF font and each of the descendant fonts in CFF FDArray. These matrices may or may not be present, are substituted with a default [0.001 0 0 0.001 0 0] matrix if omitted in most places, and must all be multiplied together in order to achieve the correct size output. Of course, in general most of these matrices are defined as the default or the identity matrix, its not common to see these defined any other way. Where they are defined differntly, the differences are generally in the FDArray entries. In large part that's what the entries in the FDArray are for. That's PostScript of course, in PDF there is no type 0 font, and the CIDFont may not have a FontMatrix. For reasons best known to itself, the Poppler PostScript output moves the FontMatrix from the CFF font to the CIDFont, and replaces the FontMatrix of the CFF font with the identity matrix. So this is where the code is unusual, we'd normally expect to see the modified array in the FDArray, not the CIDFont. When writing a PDF file the pdfwrite device cannot write a CIDFont that way, so its forced to move the FotnMatrix back where it originally was. Unfortunately there was one case where we did not write out the FontMatrix, and should have. This resulted in a missing default matrix, it appears Acrobat replaces that with the standard default matrix (which is why the output from pdfwrite displays correctly in Acrobat). Its not at all clear that this is correct, and there are comments in our code noting that the documentation is itself unclear on this point with PDF files. commit 3786f7cb0c4ccf3442beafdf186dbc6835da8ae3 fixes this without altering any of the existing test files which use non-standard FontMatrix entries to achieve effects such as artificially oblique fonts. As I've noted before PostScript and PDF are not the same and I strongly advise *NOT* converting PDF files to PostScript and back to PDF. If you have a PDF as input, and want a PDF as output, there's usually no reason to create PostScript in the middle. Created attachment 19456 [details]
minimised test file (one glyph)
Minimised test file as added to test repository
Thanks!
I applied the patch to gdevpsf2.c and can confirm that it fixes the problem for me.
At least for now, I have a PS-based workflow. It was only a coincidence in this example that a PDF input came from an external source, and I tried converting the output to PDF for email.
>For reasons best known to itself, the Poppler PostScript output moves the FontMatrix from the CFF font to the CIDFont, and replaces the FontMatrix of the CFF font with the identity matrix.
Is that worth changing in poppler?
grepping for FontMatrix in poppler, it has a number of places where it writes "/FontMatrix [1 0 0 1 0 0] def\n".
(In reply to William Bader from comment #8) > Is that worth changing in poppler? Its not incorrect. Its unusual but its perfectly valid. Given how complicated the inheritance of Font Matrices is with CIDFonts and CFF CIDFonts in PostScript I would be very wary of attempting to change it. > grepping for FontMatrix in poppler, it has a number of places where it > writes "/FontMatrix [1 0 0 1 0 0] def\n". Which is also valid. As long as the matrix algebra all works out there's nothing intrinsically wrong with writing the identity matrix at any point. Changing any of those would mean following the whole code path and as I tried to say in my comment, this is a very complicated area. Not helped by having PostScript and PDF differ :-( I'd be very wary of trying to change anything, especially since there's nothing actually wrong with the output as it stands. Thanks for the reply. I won't touch poppler. |