Summary: | Acrobat 9.0 PDF Portfolio files cannot be read | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Marcos H. Woehrmann <marcos.woehrmann> |
Component: | PDF Interpreter | Assignee: | Alex Cherepanov <alex> |
Status: | NOTIFIED FIXED | ||
Severity: | normal | CC: | birozoltan, zoltan |
Priority: | P1 | ||
Version: | master | ||
Hardware: | Macintosh | ||
OS: | MacOS X | ||
Customer: | 531 | Word Size: | --- |
Attachments: |
partial implementation
patch embedXML.pdf |
Description
Marcos H. Woehrmann
2009-04-16 17:37:37 UTC
Created attachment 4937 [details]
publix_1311._structural.pdf
This is a PDF file collection without a default document. The expected result is a list of included files that the user can click on and view separately. I wonder what Ghostscript should do in this case. My thinking is that Ghostscript should concatenating the documents in the order they are listed and generate a multi-page output file. I think we should check the version of Ghostscript and if it is not the latest, tell people that "This document is best processed with Ghostscript X.XX, please upgrade to that version for the best result." ;-) Seriously, I agree with Marcos, we should just process the files in order. Created attachment 5171 [details]
partial implementation
This is a work-in-progress patch. It modifies .tempfile operator to avoid
leaving large number of temporary files in the /tmp directory when
Ghostscript terminates abnormally.
We already have complains that pdfwrite leaves many temporary files in /tmp.
Extraction (and decompression) of individual components of the Portfolio
document will create more temporary files of larger size. The patch addresses
this issue.
Created attachment 5172 [details]
patch
This is a simple implementation that extracts all documents from PDF portfolio
to temporary files and displays them in the order they are listed
in the portfolio.
This patch is useful by as it is but I plan to add command line options
to examine the content of PDF Portfolio and select the files for processing
in the near future.
The patch is not backward-compatible; default document declaration is ignored.
The patch from the comment #6 has been committed as a rev. 9826. Regression testing shows no differences; the test suite has no PDF collections. This is adequately resolved by the fix (rev 9826). If we want to track the enhancements as an issue in the bug tracker, we (Alex) will open a new issue. Created attachment 5399 [details]
embedXML.pdf
Raises syntaxerror using GS8.70, no errors when processed by GS8.64
Something is wrong here, guys. Embedded files may not necessarily be PDFs, a typical example is the attached embedXML.pdf, which is derived (hacked) from a PDF generated by Quite Imposing Plus. In this case the embedded file is an XML, which obviously doesnt’ have (%PDF-) string hence a syntaxerror is signaled in pdfopenfile. Due to this problem, lots of PDFs generated by Quite Imposing Plus plugin cannot be processed by Ghostscript – unlike Acrobat Reader, which lets attachments open using the appropriate applications rather than treating them as PDFs. I recommend the following as immediate workaround: Let pdfopenfile (and pdfopen, runpdfbegin in turn) have one more boolean argument on stack to specify whether the file to be processed is an attachment or not. In case of attachments, non-PDF files should be leaved alone – rather than treating them as corrupt PDFs. Skip non-PDF files during enumeration of embedded file streams in PDF portfolio. Thanks to Zoltán for the sample file. The following patch has been committed as a rev. 10143. http://ghostscript.com/pipermail/gs-cvs/2009-October/009867.html Regression testing show no differences. Changing customer bugs that have been resolved more than a year ago to closed. |