Bug 690300 - Error reading PDF file: /syntaxerror in ID
Summary: Error reading PDF file: /syntaxerror in ID
Status: NOTIFIED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Interpreter (show other bugs)
Version: master
Hardware: Macintosh MacOS X
: P2 normal
Assignee: Alex Cherepanov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-02-24 08:16 UTC by Marcos H. Woehrmann
Modified: 2009-04-30 06:03 UTC (History)
0 users

See Also:
Customer: 353
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2009-02-24 08:16:35 UTC
The customer reports and I've verified that the attached file produces a "/syntaxerror in ID" error when 
read by Ghostscript 8.63, head (r9505) and other versions I tried similarly fail.

The command line I'm using for testing:

  bin/gs -sDEVICE=tiff24nc -o test.tif ./pdfout.pdf

Apple Preview and Adobe Acrobat 9.0 open the file without complaint.
Comment 1 Marcos H. Woehrmann 2009-02-24 08:20:11 UTC
Created attachment 4806 [details]
pdfout.pdf

This is page 1 of the original 40 page customer supplied file.	My attempts to
reduce the complexity of the document further by removing elements resulted in
a file that is readable by Ghostscript.
Comment 2 Alex Cherepanov 2009-02-24 09:09:54 UTC
/syntaxerror is raised by ID procedure because it cannot cope with the extra
data at the end of the image. (or the filter consumes too little).
The former should be easy to fix.
Comment 3 Alex Cherepanov 2009-02-28 09:37:40 UTC
In fact, CCITTDecode filter consumes 'E' from 'EI' and PDF interpreter
cannot recover. Acrobat distiller doesn't consume 'E'. So the bug is likely
located in the filter but can be worked around in PDF interpreter.

This is my PS file that shows the problem.
%!
/f (E:\\bug\\690300\\i.pdf)(r) file def
f 0 (<</K -1 /Columns 1664 /Rows 1140>> ID ) /SubFileDecode filter flushfile
f <</K -1 /Columns 1664 /Rows 1140>> /CCITTFaxDecode filter /ff exch def
612 792 scale
<< /ImageType 1
   /DataSource ff
   /ImageMatrix [1664 0 0 -1140 0 1140]
   /Decode [0 1]
   /Width 1664
   /Height 1140
   /BitsPerComponent 1
>> image % showpage
f =string readstring pop ==
Comment 4 Alex Cherepanov 2009-03-24 22:04:23 UTC
When CCITTEncode'd stream is not properly terminated and used as an embedded
image, the filter may consume 'E' from 'EI'. Change PDF interpreter to accept
'I' as a synonym to 'EI'.

The following patch is committed as a rev. 9593.
http://ghostscript.com/pipermail/gs-cvs/2009-March/009167.html
Regression testing shows no differences.