Bug 687601 - /undefined in PK
Summary: /undefined in PK
Status: NOTIFIED DUPLICATE of bug 687125
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Interpreter (show other bugs)
Version: master
Hardware: All All
: P3 enhancement
Assignee: Ray Johnston
URL:
Keywords: bountiable
Depends on:
Blocks:
 
Reported: 2004-07-28 13:43 UTC by Jack Moffitt
Modified: 2011-09-18 21:46 UTC (History)
1 user (show)

See Also:
Customer: 562
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jack Moffitt 2004-07-28 13:43:37 UTC
Attached file fails with Error /undefined in PKwith CVS HEAD.
Comment 1 Jack Moffitt 2004-07-28 13:44:46 UTC
Created attachment 820 [details]
9416.pdf
Comment 2 Dan Coby 2004-07-28 15:59:32 UTC
Both files contain extra (non PDF) data at the start of the file.  This is 
circumventing the file type determination logic.  As a result the files are 
being processed by the PS interpreter.
Comment 3 Jack Moffitt 2004-08-04 08:27:03 UTC
Created attachment 826 [details]
9622TradChineseExternal.pdf
Comment 4 Ray Johnston 2004-08-04 10:22:31 UTC
This can be done by adding a -LPDF language switch option,
but it would be preferable to do it automatically by opening
all files with .peekstring to look for %PDF at the start of
the file or following a EOL somewhere in the first 1024 bytes.

I confirmed that our file buffering can handle this degree
of read ahead.
Comment 5 SaGS 2004-08-16 00:55:54 UTC
Cannot verify (the attachement being private), but that "PK" makes me 
wonder: it's not a ZIP file, is it? Because if it is, reading ahead or a 
command line option won't help, because of possible compression. Even if the 
file is "stored" (à la "Java Archive"), there won't necessarily be an EOL 
before %PDF.
Comment 6 Alex Cherepanov 2004-08-28 22:05:54 UTC
The fix is submitted to code review:
http://bugs.ghostscript.com/show_bug.cgi?id=687601

Improve automatic PDF recognition in run operator. Following Acrobat
Implementation Notes H.3.4.1 search first 1024 bytes for %PDF- string.
Identify file as PDF if %PDF- is found but the file doesn't start with
%! . Recognize %!PS-Adobe-N.n PDF-M.m but don't search for it yet.
Make .peekstring return "() false" instead of /rangecheck for empty
files.

The bug report is misleading: /undefined in PK is caused by processing
ZIP archive as PS but the real problem is about the conent of the archive.
Comment 7 Russell Lang 2004-09-01 17:40:20 UTC
This looks the same as bug 687125.
Comment 8 Ray Johnston 2004-09-01 23:12:27 UTC
This similar, if not identical to 687125 that more clearly states
that a prefix (and suffix) of 1024 bytes on a PDF file is 
tolerated by Acrobat Reader (although it transends the PDF file
specification).

Alex has proposed a fix, and the older duplicate bug will be
identified as bountiable despite the issue that as far as I am
concerned, these are invalid PDF files.

Note that relying on a ZIP tool to recognize a PDF file as
non-compressible so that the original PDF exists intact in
a PKZIP archive is *REALLY* marginal. That's the main reason
for closing this bug an moving the issue to 687125 (besides
it having precedence).

Thanks to Russell Lang (gsview@ghostgum.com.au) for spotting
this duplication.

*** This bug has been marked as a duplicate of 687125 ***
Comment 9 Ray Johnston 2004-09-01 23:26:11 UTC
In fact, examination of the 9416.pdf attachment using Winzip
shows the contents to be TWO PDF files (not one). Acrobat
Reader 6 will *NOT* open the attached file. If Winzip (or
other tool) is used to extract from the PKZip archive, then
the file is found to have 128 bytes of garbage at the front
(Macintosh derived ??).

I did confirm that Ghostscript can process the file without
a problem if the first 128 bytes of the extracted file are
skipped.

Comment 10 Marcos H. Woehrmann 2011-09-18 21:46:22 UTC
Changing customer bugs that have been resolved more than a year ago to closed.