Bug 690422

Summary: Acrobat 9.0 PDF Portfolio files cannot be read
Product: Ghostscript Reporter: Marcos H. Woehrmann <marcos.woehrmann>
Component: PDF InterpreterAssignee: Alex Cherepanov <alex>
Status: NOTIFIED FIXED    
Severity: normal CC: birozoltan, zoltan
Priority: P1    
Version: master   
Hardware: Macintosh   
OS: MacOS X   
Customer: 531 Word Size: ---
Attachments: partial implementation
patch
embedXML.pdf

Description Marcos H. Woehrmann 2009-04-16 17:37:37 UTC
The attached PDF Portfolio file, produced by Acrobat Pro 9.0.0, cannot be read by Ghostscript head 
(r9645); instead of producing the expected output a single page that says "For the best experience, open 
this PDF portfolio in Acrobat 9 or Adobe Reader 9, or later.".

The command line I'm using for testing:

  bin/gs -sDEVICE=tiff24nc -o test.tif ./publix_1311._structural.pdf

Note that Apple Preview produces the same output as Ghostscript, Acrobat 8 produces the same message, 
but then allows you to page through the various pages to view the actual PDF pages.
Comment 1 Marcos H. Woehrmann 2009-04-16 17:51:54 UTC
Created attachment 4937 [details]
publix_1311._structural.pdf
Comment 2 Alex Cherepanov 2009-04-16 19:10:35 UTC
This is a PDF file collection without a default document.
The expected result is a list of included files that the user can click on
and view separately. I wonder what Ghostscript should do in this case.
Comment 3 Marcos H. Woehrmann 2009-04-16 20:25:10 UTC
My thinking is that Ghostscript should concatenating the documents in the order they are listed and 
generate a multi-page output file.
Comment 4 Ray Johnston 2009-04-17 06:48:54 UTC
I think we should check the version of Ghostscript and if it is not the latest,
tell people that "This document is best processed with Ghostscript X.XX, please
upgrade to that version for the best result." ;-)

Seriously, I agree with Marcos, we should just process the files in order.
Comment 5 Alex Cherepanov 2009-06-29 09:06:02 UTC
Created attachment 5171 [details]
partial implementation

This is a work-in-progress patch. It modifies .tempfile operator to avoid
leaving large number of temporary files in the /tmp directory when
Ghostscript terminates abnormally.

We already have complains that pdfwrite leaves many temporary files in /tmp.
Extraction (and decompression) of individual components of the Portfolio
document will create more temporary files of larger size. The patch addresses
this issue.
Comment 6 Alex Cherepanov 2009-06-29 11:17:25 UTC
Created attachment 5172 [details]
patch

This is a simple implementation that extracts all documents from PDF portfolio
to temporary files and displays them in the order they are listed
in the portfolio.

This patch is useful by as it is but I plan to add command line options
to examine the content of PDF Portfolio and select the files for processing
in the near future.

The patch is not backward-compatible; default document declaration is ignored.
Comment 7 Alex Cherepanov 2009-06-29 17:09:51 UTC
The patch from the comment #6 has been committed as a rev. 9826.
Regression testing shows no differences; the test suite has no PDF
collections.
Comment 8 Ray Johnston 2009-07-06 19:36:20 UTC
This is adequately resolved by the fix (rev 9826).

If we want to track the enhancements as an issue in the bug tracker, we (Alex)
will open a new issue.
Comment 9 Zoltán 2009-09-25 06:03:01 UTC
Created attachment 5399 [details]
embedXML.pdf

Raises syntaxerror using GS8.70, no errors when processed by GS8.64
Comment 10 Zoltán 2009-09-25 06:05:10 UTC
Something is wrong here, guys.
Embedded files may not necessarily be PDFs, a typical example is the attached 
embedXML.pdf, which is derived (hacked) from a PDF generated by Quite Imposing 
Plus. 
In this case the embedded file is an XML, which obviously doesnt’ have (%PDF-) 
string hence a syntaxerror is signaled in pdfopenfile.

Due to this problem, lots of PDFs generated by Quite Imposing Plus plugin 
cannot be processed by Ghostscript – unlike Acrobat Reader, which lets 
attachments open using the appropriate applications rather than treating them 
as PDFs.

I recommend the following as immediate workaround:
Let pdfopenfile (and pdfopen, runpdfbegin in turn) have one more boolean 
argument on stack to specify whether the file to be processed is an attachment 
or not. In case of attachments, non-PDF files should be leaved alone – rather 
than treating them as corrupt PDFs.
Comment 11 Alex Cherepanov 2009-10-06 04:48:08 UTC
Skip non-PDF files during enumeration of embedded file streams in PDF
portfolio. Thanks to Zoltán for the sample file.

The following patch has been committed as a rev. 10143.
http://ghostscript.com/pipermail/gs-cvs/2009-October/009867.html
Regression testing show no differences.
Comment 12 Marcos H. Woehrmann 2011-09-18 21:45:55 UTC
Changing customer bugs that have been resolved more than a year ago to closed.