Please find attached, a PDF that crashes 8.54 release of GS. This is multipage, occurring on Page 2. I have not confirmed if it happens on the HEAD version. Gswin32c -r600 -sDEVICE=tiffpack -dBATCH -dNOPAUSE -sOutputFile=PLANHALF.tif PLANHALF.pdf Could you please advise? Perhaps something to do with the embedded pictures?
Created attachment 2460 [details] PLANHALF.pdf
I've confirmed this bug occurs with 8.54 and HEAD under Linux: Processing pages 1 through 14. Page 1 >>showpage, press <return> to continue<< Page 2 Segmentation fault When I tried it with the version of Ghostscript that comes with Ubuntu 6.06 it produced an error but did not crash: ESP Ghostscript 815.02 (2006-04-19) Copyright (C) 2004 artofcode LLC, Benicia, CA. All rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for details. Processing pages 1 through 14. Page 1 >>showpage, press <return> to continue<< Page 2 **** ERROR: Unable to process JPXDecode data. Page will be missing data. **** ERROR: Unable to process JPXDecode data. Page will be missing data. Loading NimbusRomNo9L-Regu font from /var/lib/defoma/gs.d/dirs/fonts/n021003l.pfb... 3219736 1643425 1740280 445517 3 done. >>showpage, press <return> to continue<<
It seems there are two issues here. One the JPX streams are corrupt (but apparently parseable by the Adobe and XPDF decoders) since they fail with both Jasper and Luratech decoders. Second, we do not handle this gracefully, resulting in a segfault. Handling the error should be straightforward, but figuring out how to parse the invalid image data will take longer.
Further analysis: The Luratech decoder reports "missing component mapping" and returns and error. Ghostscript reports "insuffient data for an image" and then dies with '/typecheck in --run--'. We should continue rendering without the image. Jasper actually segfaults in the decode, so the segfault fix means handling the stream. The jpx stream also specifies an 8 bpc indexed CIELAB colorspace. There's an overriding colorspace given in the stream dictionary as well; I've not checked to see if these match. We've not implemented proper support for overriding the image's colorspace interpretation.
Some further analysis: The file is a bit odd. It contains an 8 bit, single component image, and a palette to map this to a 3 channel sRGB output image. However, the channel map only uses the red channel of this, and copies the single component into the green and blue channels. Neither jasper nor the luratech decoders implement this properly. XPDF works because it ignores the palette and cmap entirely, passing the decoded component data directly to the pdf interpreter.
I believe jasper is now doing what the cmap box in the JPX stream asks. The relevent patches are: r7044 http://ghostscript.com/pipermail/gs-cvs/2006-September/006805.html r7060 http://ghostscript.com/pipermail/gs-cvs/2006-September/006821.html r7072 http://ghostscript.com/pipermail/gs-cvs/2006-September/006833.html However, the image does not look correct because the /Colorspace key in the image dictionary is supposed to override this. And in the case of the image on page 6, the image was encoded with this assumption in a way that decoding the raw stream will not even yield correct results. Rather, this crazy aspect of the PDF spec requires that we extract the decoded data before the embedded palette has been applied and let the PDF interpreter apply its own palette separately. I've not verified this, but expect doing so to fix the problem. The PDF interpreter must be modified to pass the required out-of-stream information to the filter. The suggestion is to create a custom FilterParams key to signal the relevent information.
Created attachment 2643 [details] minimal patch fixing the issue A minimal patch against head that corrects the problem. This was committed as r7456, r7458 and r7459
Created attachment 2644 [details] patch against 8.54 patch against the 8.54 release (there's a minor conflict with the int.mak changes).
I did a regression run on 8.54 with and without this patch and got a slight rendering offset on D-12-2025-9478-9.pdf with pdfwrite, ppmraw, 300 dpi, noband. I have no idea why this difference would occur, but it's not an obvious regression. Closing the bug.
Created attachment 2657 [details] additional patch to unbreak other files The previous patch was overzealous in creating a parmdict, causing problems on some other pdf files. This additional patch only does this for /Filter /JPXDecode streams.