Bug 690422 - Acrobat 9.0 PDF Portfolio files cannot be read
Acrobat 9.0 PDF Portfolio files cannot be read
Status: NOTIFIED FIXED
Product: Ghostscript
Classification: Unclassified
Component: PDF Interpreter
master
Macintosh MacOS X
: P1 normal
Assigned To: Alex Cherepanov
Bug traffic
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2009-04-16 17:37 PDT by Marcos H. Woehrmann
Modified: 2011-09-18 21:45 PDT (History)
2 users (show)

See Also:
Customer: 531
Word Size: ---


Attachments
partial implementation (9.06 KB, patch)
2009-06-29 09:06 PDT, Alex Cherepanov
Details | Diff
patch (3.92 KB, patch)
2009-06-29 11:17 PDT, Alex Cherepanov
Details | Diff
embedXML.pdf (11.13 KB, application/pdf)
2009-09-25 06:03 PDT, Zoltán
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2009-04-16 17:37:37 PDT
The attached PDF Portfolio file, produced by Acrobat Pro 9.0.0, cannot be read by Ghostscript head 
(r9645); instead of producing the expected output a single page that says "For the best experience, open 
this PDF portfolio in Acrobat 9 or Adobe Reader 9, or later.".

The command line I'm using for testing:

  bin/gs -sDEVICE=tiff24nc -o test.tif ./publix_1311._structural.pdf

Note that Apple Preview produces the same output as Ghostscript, Acrobat 8 produces the same message, 
but then allows you to page through the various pages to view the actual PDF pages.
Comment 1 Marcos H. Woehrmann 2009-04-16 17:51:54 PDT
Created attachment 4937 [details]
publix_1311._structural.pdf
Comment 2 Alex Cherepanov 2009-04-16 19:10:35 PDT
This is a PDF file collection without a default document.
The expected result is a list of included files that the user can click on
and view separately. I wonder what Ghostscript should do in this case.
Comment 3 Marcos H. Woehrmann 2009-04-16 20:25:10 PDT
My thinking is that Ghostscript should concatenating the documents in the order they are listed and 
generate a multi-page output file.
Comment 4 Ray Johnston 2009-04-17 06:48:54 PDT
I think we should check the version of Ghostscript and if it is not the latest,
tell people that "This document is best processed with Ghostscript X.XX, please
upgrade to that version for the best result." ;-)

Seriously, I agree with Marcos, we should just process the files in order.
Comment 5 Alex Cherepanov 2009-06-29 09:06:02 PDT
Created attachment 5171 [details]
partial implementation

This is a work-in-progress patch. It modifies .tempfile operator to avoid
leaving large number of temporary files in the /tmp directory when
Ghostscript terminates abnormally.

We already have complains that pdfwrite leaves many temporary files in /tmp.
Extraction (and decompression) of individual components of the Portfolio
document will create more temporary files of larger size. The patch addresses
this issue.
Comment 6 Alex Cherepanov 2009-06-29 11:17:25 PDT
Created attachment 5172 [details]
patch

This is a simple implementation that extracts all documents from PDF portfolio
to temporary files and displays them in the order they are listed
in the portfolio.

This patch is useful by as it is but I plan to add command line options
to examine the content of PDF Portfolio and select the files for processing
in the near future.

The patch is not backward-compatible; default document declaration is ignored.
Comment 7 Alex Cherepanov 2009-06-29 17:09:51 PDT
The patch from the comment #6 has been committed as a rev. 9826.
Regression testing shows no differences; the test suite has no PDF
collections.
Comment 8 Ray Johnston 2009-07-06 19:36:20 PDT
This is adequately resolved by the fix (rev 9826).

If we want to track the enhancements as an issue in the bug tracker, we (Alex)
will open a new issue.
Comment 9 Zoltán 2009-09-25 06:03:01 PDT
Created attachment 5399 [details]
embedXML.pdf

Raises syntaxerror using GS8.70, no errors when processed by GS8.64
Comment 10 Zoltán 2009-09-25 06:05:10 PDT
Something is wrong here, guys.
Embedded files may not necessarily be PDFs, a typical example is the attached 
embedXML.pdf, which is derived (hacked) from a PDF generated by Quite Imposing 
Plus. 
In this case the embedded file is an XML, which obviously doesnt’ have (%PDF-) 
string hence a syntaxerror is signaled in pdfopenfile.

Due to this problem, lots of PDFs generated by Quite Imposing Plus plugin 
cannot be processed by Ghostscript – unlike Acrobat Reader, which lets 
attachments open using the appropriate applications rather than treating them 
as PDFs.

I recommend the following as immediate workaround:
Let pdfopenfile (and pdfopen, runpdfbegin in turn) have one more boolean 
argument on stack to specify whether the file to be processed is an attachment 
or not. In case of attachments, non-PDF files should be leaved alone – rather 
than treating them as corrupt PDFs.
Comment 11 Alex Cherepanov 2009-10-06 04:48:08 PDT
Skip non-PDF files during enumeration of embedded file streams in PDF
portfolio. Thanks to Zoltán for the sample file.

The following patch has been committed as a rev. 10143.
http://ghostscript.com/pipermail/gs-cvs/2009-October/009867.html
Regression testing show no differences.
Comment 12 Marcos H. Woehrmann 2011-09-18 21:45:55 PDT
Changing customer bugs that have been resolved more than a year ago to closed.