Summary: | /undefined in PK | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Jack Moffitt <jack> |
Component: | PDF Interpreter | Assignee: | Ray Johnston <ray.johnston> |
Status: | NOTIFIED DUPLICATE | ||
Severity: | enhancement | CC: | gsview |
Priority: | P3 | Keywords: | bountiable |
Version: | master | ||
Hardware: | All | ||
OS: | All | ||
Customer: | 562 | Word Size: | --- |
Description
Jack Moffitt
2004-07-28 13:43:37 UTC
Created attachment 820 [details]
9416.pdf
Both files contain extra (non PDF) data at the start of the file. This is circumventing the file type determination logic. As a result the files are being processed by the PS interpreter. Created attachment 826 [details]
9622TradChineseExternal.pdf
This can be done by adding a -LPDF language switch option, but it would be preferable to do it automatically by opening all files with .peekstring to look for %PDF at the start of the file or following a EOL somewhere in the first 1024 bytes. I confirmed that our file buffering can handle this degree of read ahead. Cannot verify (the attachement being private), but that "PK" makes me wonder: it's not a ZIP file, is it? Because if it is, reading ahead or a command line option won't help, because of possible compression. Even if the file is "stored" (à la "Java Archive"), there won't necessarily be an EOL before %PDF. The fix is submitted to code review: http://bugs.ghostscript.com/show_bug.cgi?id=687601 Improve automatic PDF recognition in run operator. Following Acrobat Implementation Notes H.3.4.1 search first 1024 bytes for %PDF- string. Identify file as PDF if %PDF- is found but the file doesn't start with %! . Recognize %!PS-Adobe-N.n PDF-M.m but don't search for it yet. Make .peekstring return "() false" instead of /rangecheck for empty files. The bug report is misleading: /undefined in PK is caused by processing ZIP archive as PS but the real problem is about the conent of the archive. This looks the same as bug 687125. This similar, if not identical to 687125 that more clearly states that a prefix (and suffix) of 1024 bytes on a PDF file is tolerated by Acrobat Reader (although it transends the PDF file specification). Alex has proposed a fix, and the older duplicate bug will be identified as bountiable despite the issue that as far as I am concerned, these are invalid PDF files. Note that relying on a ZIP tool to recognize a PDF file as non-compressible so that the original PDF exists intact in a PKZIP archive is *REALLY* marginal. That's the main reason for closing this bug an moving the issue to 687125 (besides it having precedence). Thanks to Russell Lang (gsview@ghostgum.com.au) for spotting this duplication. *** This bug has been marked as a duplicate of 687125 *** In fact, examination of the 9416.pdf attachment using Winzip shows the contents to be TWO PDF files (not one). Acrobat Reader 6 will *NOT* open the attached file. If Winzip (or other tool) is used to extract from the PKZip archive, then the file is found to have 128 bytes of garbage at the front (Macintosh derived ??). I did confirm that Ghostscript can process the file without a problem if the first 128 bytes of the extracted file are skipped. Changing customer bugs that have been resolved more than a year ago to closed. |