The customer reports: Please find the attached .PDF files that were unable to be converted to .TIF using Artifex GhostScript 9.04 (2011-08-05). We are converting with the following command line gswin32c.exe -sDEVICE=tiffg4 -dPDFFitPage -dSAFER -r300 -o 20764896.tif 20764896.pdf I'm not able to reproduce this issue, when I run the command all 15 pages process correctly (this is on Linux, Mac OS X, and Windows). I've verified with the customer that they are using Windows XP SP3 (the same version I am) and an unmodified gs9.04 installation. I've asked the customer to add the -dPDFDEBUG option to the command line and will attach the output when I have it.
This seems to be a problem involving the Luratec JBIG2 decoder (although the root cause may be related to the stream length warning). For the time being, adding "JBIG2_LIB=jbig2dec" (without quotes) to the (n)make command line options will revert to using jbig2dec, which appears to work as expected.
Assigning to Alex to confirm that we are correctly calling the luratech decoder; if we are please reassign to Henry.
Created attachment 7971 [details] obj_613_data.jb2 Using the attached file as input with the luratech jb2_demo_dbg.exe I get: Info : Unknown or embedded file format organisation Info : Segment number : 42 Info : Segment type : 48 (Page information) Info : Referred to segments : 0 Info : Page association : 1 Info : Segment data position : 11 (42) Info : Segment data length : 19 bytes Info : Segment number : 43 Info : Segment type : 0 (Symbol dictionary) Info : Referred to segments : 1 Info : Page association : 1 Warning : Unable to find requested segment! Warning : Unable to find referred-to segment (1)! Warning : Attempting to continue decoding! Warning : Retain bit should be 1 for referred to segment (1)! Warning : Attempting to continue decoding! Info : Segment data position : 42 (43) Info : Segment data length : 25 bytes Info : Segment number : 44 Info : Segment type : 0 (Symbol dictionary) Info : Referred to segments : 0 Info : Page association : 1 Info : Segment data position : 78 (44) Info : Segment data length : 71315 bytes Info : Segment number : 45 Info : Segment type : 6 (Immediate text region) Info : Referred to segments : 2 Info : Page association : 1 Warning : Retain bit should be 1 for referred to segment (43)! Warning : Attempting to continue decoding! Warning : Retain bit should be 1 for referred to segment (44)! Warning : Attempting to continue decoding! Info : Segment data position : 71406 (45) Info : Segment data length : 8584 bytes Error : Error getting requested symbol from symbol dictionary! Error : Unable to access symbol in text region! Error : Unable to determine details of symbol instance in text region! Decompression failed
I forgot to mention the command line I used to test the luratech demo: jb2_demo_dbg.exe d -i obj_613_data.jb2 -o obj_613.tif
Turning on -Zw with a debug build produces the output showing the difference to the jb2_demo_dbg output. After the warnings, ghostscript continues to run and gets invalid data, but the demo stops with "Decompression successful". GS output is: Page 13 **** Warning: stream Length incorrect. [w]Luratech JBIG2 info Unknown or embedded file format organisation [w]Luratech JBIG2 info Segment number : 1 [w]Luratech JBIG2 info Segment type : 0 (Symbol dictionary) [w]Luratech JBIG2 info Referred to segments : 0 [w]Luratech JBIG2 info Page association : 0 [w]Luratech JBIG2 info Segment data position : 11 (1) [w]Luratech JBIG2 info Segment data length : 98271 bytes [w]Luratech JBIG2 info Segment number : 42 [w]Luratech JBIG2 info Segment type : 48 (Page information) [w]Luratech JBIG2 info Referred to segments : 0 [w]Luratech JBIG2 info Page association : 1 [w]Luratech JBIG2 info Segment data position : 98293 (42) [w]Luratech JBIG2 info Segment data length : 19 bytes [w]Luratech JBIG2 info Segment number : 43 [w]Luratech JBIG2 info Segment type : 0 (Symbol dictionary) [w]Luratech JBIG2 info Referred to segments : 1 [w]Luratech JBIG2 info Page association : 1 [w]Luratech JBIG2 WARNING Retain bit should be 1 for referred to segment (1)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 info Segment data position : 98324 (43) [w]Luratech JBIG2 info Segment data length : 25 bytes [w]Luratech JBIG2 info Segment number : 44 [w]Luratech JBIG2 info Segment type : 0 (Symbol dictionary) [w]Luratech JBIG2 info Referred to segments : 0 [w]Luratech JBIG2 info Page association : 1 [w]Luratech JBIG2 info Segment data position : 98360 (44) [w]Luratech JBIG2 info Segment data length : 71315 bytes [w]Luratech JBIG2 info Segment number : 45 [w]Luratech JBIG2 info Segment type : 6 (Immediate text region) [w]Luratech JBIG2 info Referred to segments : 2 [w]Luratech JBIG2 info Page association : 1 [w]Luratech JBIG2 WARNING Retain bit should be 1 for referred to segment (43)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 WARNING Retain bit should be 1 for referred to segment (44)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 info Segment data position : 169688 (45) [w]Luratech JBIG2 info Segment data length : 8584 bytes --->>> after here is when gs output differs <<<--- [w]Luratech JBIG2 info Segment number : 824193056 [w]Luratech JBIG2 WARNING Skipping segment : 824193056 [w]Luratech JBIG2 WARNING Unknown segment type : 47 [w]Luratech JBIG2 info Segment type : 47 (Unknown) [w]Luratech JBIG2 info Referred to segments : 3 [w]Luratech JBIG2 info Page association : 840970272 [w]Luratech JBIG2 WARNING Unable to find requested segment! [w]Luratech JBIG2 WARNING Unable to find referred-to segment (1779055676)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 WARNING Retain bit should be 1 for referred to segment (1779055676)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 WARNING Unable to find requested segment! [w]Luratech JBIG2 WARNING Unable to find referred-to segment (793535854)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 WARNING Retain bit should be 1 for referred to segment (793535854)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 WARNING Unable to find requested segment! [w]Luratech JBIG2 WARNING Unable to find referred-to segment (1735682080)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 WARNING Retain bit should be 1 for referred to segment (1735682080)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 info Segment data position : 178298 (824193056) [w]Luratech JBIG2 info Segment data length : 1379810826 bytes
The standalone jb2_demo_dbg gets an error from JB2_Read_Data_ULong when the szPos is 178272, which is the length of the obj_1_613.jb2 file. With Ghostscript, possibly due to the invalid stream length, we don't stop reading there (after segment 45), then process garbage and crash. There is no 'logical EOF' seen in the JBIG2 stream we have, and since there is no 'endstream', ghostscript doesn't install a SubfileDecode filter with a count, so the termination is a problem. I don't see a way around this without coming up with a count so we can use a SubfileDecode filter. The only way I can see with this file is to look for either and 'endobj' (which isn't present either in this broken PDF), or the start of the next object (by looking at all of the object positions in the xref and finding the next closest). This latter approach _should_ work in this case. Note that the 'Length' is actually correct -- it's just that we check for an endstream and when we don't see that at the expected position, we ignore the Length. Maybe Alex can implement one or both of these methods so we can avoid allowing filters to read past the end of the object.
Fix out-of-buffer access in Luratech jb2 interface. Luratech jb2 library can required data outside of the input buffer if it is fed a corrupted data stream. The old code tried to detect this but failed because of the missed signed-to-unsigned promotion. See: http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=f790680acba5c1574728d5ff40124f9e27762d2a
While this (sort of) avoids the segfault, we _still_ are getting garbage data into the decoder. The -Zw output still shows: [w]Luratech JBIG2 info Segment number : 824193056 [w]Luratech JBIG2 WARNING Skipping segment : 824193056 [w]Luratech JBIG2 WARNING Unknown segment type : 47 [w]Luratech JBIG2 info Segment type : 47 (Unknown) [w]Luratech JBIG2 info Referred to segments : 3 [w]Luratech JBIG2 info Page association : 840970272 [w]Luratech JBIG2 WARNING Unable to find requested segment! [w]Luratech JBIG2 WARNING Unable to find referred-to segment (1779055676)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 WARNING Retain bit should be 1 for referred to segment (1779055676)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 WARNING Unable to find requested segment! [w]Luratech JBIG2 WARNING Unable to find referred-to segment (793535854)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 WARNING Retain bit should be 1 for referred to segment (793535854)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 WARNING Unable to find requested segment! [w]Luratech JBIG2 WARNING Unable to find referred-to segment (1735682080)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 WARNING Retain bit should be 1 for referred to segment (1735682080)! [w]Luratech JBIG2 WARNING Attempting to continue decoding! [w]Luratech JBIG2 info Segment data position : 178298 (824193056) The data being read after segment 45 is actually from the next object. Memory dump: 0x01D038B6 31 20 30 20 6f 62 6a 0a 3c 3c 2f 4c 65 6e 67 74 1 0 obj.<</Lengt 0x01D038C6 68 20 32 20 30 20 52 3e 3e 0a 73 74 72 65 61 6d h 2 0 R>>.stream 0x01D038D6 0a 00 00 00 01 00 00 00 00 01 7f df 00 00 03 ff ...........ß...ÿ Since the data following a corrupted JBIG2 stream could be anything, I'm sure we can move objects around and make this type of file still fail, although I might have to try harder to come up with a segfault. I maintain that we need to prevent reading into the header of another object (per comment 9). Re-opening for discussion. Reopening
The question is what to do with incorrect stream length. PDF has 3 ways to define the stream length. 1. /Length attribute. Quite often this attribute is incorrect. 2. End of stream marker in most data streams. Corrupted or uncompressed streams don't have this. 3. 'endstream' keyword. This keyword can be part of data. 4. external limits like image- and function sizes. This doesn't work for compressed streams or variable-size objects. So none of the end of stream markers are reliable. Currently gs relies on end markers in the stream complemented with 'endstream' detection for uncompressed streams. It is possible to validate /Length attribute against 'endstream' keyword and use validated length to limit the access to the stream data. This approach will prevent access to wild data by filters, but increase run-time costs for every file. Limiting the access won't increase the security because invalid data can be placed by an attacker into a stream with a valid counter. Since the crash caused by faulty data validation code in Luratech JBIG2 library is now fixed, I believe that's all we need to do.