Bug 688869

Summary: Ghostscript crashes reading PDF file - likely JPX issue
Product: Ghostscript Reporter: Marcos H. Woehrmann <marcos.woehrmann>
Component: Graphics LibraryAssignee: Ralph Giles <ralph.giles>
Status: NOTIFIED FIXED    
Severity: major    
Priority: P1    
Version: 8.54   
Hardware: PC   
OS: Windows XP   
Customer: 531 Word Size: ---
Attachments: PLANHALF.pdf
minimal patch fixing the issue
patch against 8.54
additional patch to unbreak other files

Description Marcos H. Woehrmann 2006-09-05 11:15:51 UTC
Please find attached, a PDF that crashes 8.54 release of GS. This is
multipage, occurring on Page 2. I have not confirmed if it happens on
the HEAD version.

Gswin32c -r600 -sDEVICE=tiffpack -dBATCH -dNOPAUSE
-sOutputFile=PLANHALF.tif PLANHALF.pdf

Could you please advise? Perhaps something to do with the embedded
pictures?
Comment 1 Marcos H. Woehrmann 2006-09-05 11:16:34 UTC
Created attachment 2460 [details]
PLANHALF.pdf
Comment 2 Marcos H. Woehrmann 2006-09-05 11:27:39 UTC
I've confirmed this bug occurs with 8.54 and HEAD under Linux:

Processing pages 1 through 14.
Page 1
>>showpage, press <return> to continue<<

Page 2
Segmentation fault

When I tried it with the version of Ghostscript that comes with Ubuntu 6.06 it produced an error but did 
not crash:

ESP Ghostscript 815.02 (2006-04-19)
Copyright (C) 2004 artofcode LLC, Benicia, CA.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 14.
Page 1
>>showpage, press <return> to continue<<

Page 2
   **** ERROR: Unable to process JPXDecode data. Page will be missing data.
   **** ERROR: Unable to process JPXDecode data. Page will be missing data.
Loading NimbusRomNo9L-Regu font from /var/lib/defoma/gs.d/dirs/fonts/n021003l.pfb... 3219736 
1643425 1740280 445517 3 done.
>>showpage, press <return> to continue<<

Comment 3 Ralph Giles 2006-09-12 15:32:37 UTC
It seems there are two issues here. One the JPX streams are corrupt (but
apparently parseable by the Adobe and XPDF decoders) since they fail with both
Jasper and Luratech decoders. Second, we do not handle this gracefully,
resulting in a segfault.

Handling the error should be straightforward, but figuring out how to parse the
invalid image data will take longer.
Comment 4 Ralph Giles 2006-09-12 17:19:26 UTC
Further analysis:

The Luratech decoder reports "missing component mapping" and returns and error.
Ghostscript reports "insuffient data for an image" and then dies with
'/typecheck in --run--'. We should continue rendering without the image.

Jasper actually segfaults in the decode, so the segfault fix means handling the
stream.

The jpx stream also specifies an 8 bpc indexed CIELAB colorspace. There's an
overriding colorspace given in the stream dictionary as well; I've not checked
to see if these match. We've not implemented proper support for overriding the
image's colorspace interpretation.
Comment 5 Ralph Giles 2006-09-21 17:31:06 UTC
Some further analysis:

The file is a bit odd. It contains an 8 bit, single component image, and a
palette to map this to a 3 channel sRGB output image. However, the channel map
only uses the red channel of this, and copies the single component into the
green and blue channels. Neither jasper nor the luratech decoders implement this
properly.

XPDF works because it ignores the palette and cmap entirely, passing the decoded
component data directly to the pdf interpreter.
Comment 6 Ralph Giles 2006-09-27 14:54:15 UTC
I believe jasper is now doing what the cmap box in the JPX stream asks. The
relevent patches are:

r7044 http://ghostscript.com/pipermail/gs-cvs/2006-September/006805.html
r7060 http://ghostscript.com/pipermail/gs-cvs/2006-September/006821.html
r7072 http://ghostscript.com/pipermail/gs-cvs/2006-September/006833.html

However, the image does not look correct because the /Colorspace key in the
image dictionary is supposed to override this. And in the case of the image on
page 6, the image was encoded with this assumption in a way that decoding the
raw stream will not even yield correct results.

Rather, this crazy aspect of the PDF spec requires that we extract the decoded
data before the embedded palette has been applied and let the PDF interpreter
apply its own palette separately. I've not verified this, but expect doing so to
fix the problem. The PDF interpreter must be modified to pass the required
out-of-stream information to the filter. The suggestion is to create a custom
FilterParams key to signal the relevent information.
Comment 7 Ralph Giles 2006-12-06 16:21:48 UTC
Created attachment 2643 [details]
minimal patch fixing the issue

A minimal patch against head that corrects the problem. This was committed as
r7456, r7458 and r7459
Comment 8 Ralph Giles 2006-12-06 16:42:35 UTC
Created attachment 2644 [details]
patch against 8.54

patch against the 8.54 release (there's a minor conflict with the int.mak
changes).
Comment 9 Ralph Giles 2006-12-07 17:09:38 UTC
I did a regression run on 8.54 with and without this patch and got a slight
rendering offset on D-12-2025-9478-9.pdf with pdfwrite, ppmraw, 300 dpi, noband.

I have no idea why this difference would occur, but it's not an obvious regression.

Closing the bug.
Comment 10 Ralph Giles 2006-12-11 16:51:51 UTC
Created attachment 2657 [details]
additional patch to unbreak other files

The previous patch was overzealous in creating a parmdict, causing problems on
some other pdf files. This additional patch only does this for /Filter
/JPXDecode streams.