Bug 690300

Summary:	Error reading PDF file: /syntaxerror in ID
Product:	Ghostscript	Reporter:	Marcos H. Woehrmann <marcos.woehrmann>
Component:	PDF Interpreter	Assignee:	Alex Cherepanov <alex>
Status:	NOTIFIED FIXED
Severity:	normal
Priority:	P2
Version:	master
Hardware:	Macintosh
OS:	MacOS X
Customer:	353	Word Size:	---

Description Marcos H. Woehrmann 2009-02-24 08:16:35 UTC

The customer reports and I've verified that the attached file produces a "/syntaxerror in ID" error when 
read by Ghostscript 8.63, head (r9505) and other versions I tried similarly fail.

The command line I'm using for testing:

  bin/gs -sDEVICE=tiff24nc -o test.tif ./pdfout.pdf

Apple Preview and Adobe Acrobat 9.0 open the file without complaint.

Comment 1 Marcos H. Woehrmann 2009-02-24 08:20:11 UTC

Created attachment 4806 [details]
pdfout.pdf

This is page 1 of the original 40 page customer supplied file.	My attempts to
reduce the complexity of the document further by removing elements resulted in
a file that is readable by Ghostscript.

Comment 2 Alex Cherepanov 2009-02-24 09:09:54 UTC

/syntaxerror is raised by ID procedure because it cannot cope with the extra
data at the end of the image. (or the filter consumes too little).
The former should be easy to fix.

Comment 3 Alex Cherepanov 2009-02-28 09:37:40 UTC

In fact, CCITTDecode filter consumes 'E' from 'EI' and PDF interpreter
cannot recover. Acrobat distiller doesn't consume 'E'. So the bug is likely
located in the filter but can be worked around in PDF interpreter.

This is my PS file that shows the problem.
%!
/f (E:\\bug\\690300\\i.pdf)(r) file def
f 0 (<</K -1 /Columns 1664 /Rows 1140>> ID ) /SubFileDecode filter flushfile
f <</K -1 /Columns 1664 /Rows 1140>> /CCITTFaxDecode filter /ff exch def
612 792 scale
<< /ImageType 1
   /DataSource ff
   /ImageMatrix [1664 0 0 -1140 0 1140]
   /Decode [0 1]
   /Width 1664
   /Height 1140
   /BitsPerComponent 1
>> image % showpage
f =string readstring pop ==

Comment 4 Alex Cherepanov 2009-03-24 22:04:23 UTC

When CCITTEncode'd stream is not properly terminated and used as an embedded
image, the filter may consume 'E' from 'EI'. Change PDF interpreter to accept
'I' as a synonym to 'EI'.

The following patch is committed as a rev. 9593.
http://ghostscript.com/pipermail/gs-cvs/2009-March/009167.html
Regression testing shows no differences.