Sample file: I cannot attach the sample file here (not being mine), but it can be obtained from Adobe's site. (1) Get "Creating a Custom Dynamic Stamp" (lc_custom_stamp_tip.pdf) from http://partners.adobe.com/public/developer/tips/index.html . (2) This file contains another PDF as a file attachement (content_feedback_form.pdf, 155248 bytes, producer: Adobe Designer 7.0). This is the PDF that shows the problem mentioned below. The sample file mentioned above contains a trailer dictionary that is NOT followed by the token "startxref". On one hand, it is true that only the last "startxref" and "%%EOF" are usefull. On the other hand, the PDF Reference describes the "File trailer" as a block consisting of token "trailer" + trailer dictionary + token "startxref" + number (pointing to tha start of the "classic" xref table) + "%%EOF". So, I consider the sample PDF as being wrong. However, Ghostscript's behaviour in this case can be improved. Current HEAD ends reading the xref+trailer only when encountering "startxref", so in this case it considers the xref as invalid and tries to rebuild it. After applying the proposed patch, reading the xref+trailer stops at the closing ">>". There is still a check for "startxref" and a warning when it is missing, but the rebuild logic is not triggered (and not needed). The change is important because the rebuild logic currently does NOT handle compressed object streams. The sample file is a hybrid-reference PDF. The defective trailer exists only because /XRefStm has to be in an update section, not because the file was really updated. If the rebuild logic is triggered, it misses objects accessible only to PDF1.5+ viewers.
Created attachment 1651 [details] Suggested patch. The attached patch fixes only the problem mentione din this report. For the sample file to be processed 100% correctly, the following patches need to be applied too: #688150 "ReusableStreamDecode available but failing at languagelevel 2". #688152 "'Undefined in get' and extra trailer keys with pdfwrite.ps and PDF1.5+" #688282 "Xref-streams ignored in hybrid-ref PDFs"
Because of a patch commited to tolerate "startxref" misspelled as "startref" (without the "x"), GNU patch partly rejects the diff in comment #1. Fix: the "/startref ..." line introduced by the mentioned commit may go away, because now scanning ends at the closing ">>", not at "startxref". Patched GS gives a warning for any misspelling or absence of the token "startxref". An updated version is file Bug687796-r7114IJ-to-r7114IJH.diff.txt in attachment #2548 [details] (from bug #687796 comment #25, "pdfinflt.ps creates an incomplete PDF"). That particular diff needs some other patches from there to be applied before it. Not that the fixes are interdependent, but the modified lines are physically too close for GNU patch to find the context it needs and apply the diffs.
Created attachment 3500 [details] Additional test case. The PDF 1.7 Reference, section F.2.3 "First-Page Cross-Reference Table and Trailer (Part 3)", 2nd paragraph on page 1031 states: "The first-page trailer may optionally end with startxref, an integer, and %%EOF, just as in an ordinary trailer. This information is ignored." (note the word "optionally"), so in such a case a PDF interpreter cannot rely on anything particular (like "startxref"/ "startref"/ "xref") to follow the trailer dictionary. Now consider a linearized PDF being incrementally updated, and you get that in the general case a "previous trailer" may lack the "startxref". It's possible Acrobat does a similar thing when creating hybrid-ref PDFs. Attached PDF is such a file that "legally" lacks a "startxref". (Although originally generated by Distiller, it was binary-edited to overwrite the "startxref + number + %%EOF" with spaces.) Current GS (SVN rev 8312) fails to display this file because it scans beyond the end of the trailer dictionary. Reader displays it fine. - The 1st time the trailer dict is red, the error triggers an unnecessary rebuild. This is described by the current report and addressed by the suggested patch. - During the rebuild, it fails again with an "Error: /undefined in /BXlevel" or an "Error: /undefined in --run--", depending on how you run GS (*). Please see bug #687796 comment #25 point (I). (*) Don't know why, from some time most (all?) errors that appear while executing GS from the command line (platform: WIN32) are signaled as "... in --run--" or "... in --execute--". When run by GSView, the errors are more informative. Is this OK, or a bug?
Created attachment 3501 [details] Updated patch (against -r8312). - The warning about missing "startxref" removed, since this keyword is not always required. This also avoids getting the "This file had errors that were repaired or ignored. ..." message for this case. - Diff regenerated to be applicable to current TRUNK. It undoes the change in rev 8312 and the addition of "/startref ..." from rev 6979 too, as these are not usefull anymore.
Created attachment 3505 [details] Modified patch Thanks for the patch, SaGS. I think the patch can be committed but I'd like to do 2 minor changes. 1. Use 'def' instead of 'store'. There's no other dictionaries on the stack but 'store' suggests the opposite. 2. Move the initialization of dictlevelcount to pdfopenfile where many other variables get initialized. Do you agree ? Regression testing shows no differences.
If you prefer these modifications, and no problem appeared with the modified code, it's OK with me. I'll just explain here why I coded like this: For (1): In general I prefer "store" for MODIFYING values, being more robust - it does not depend [too much] on the dict stack contents, and will immediately catch the error if the correct dict is missing from the dict stack. For (2): I think it's better to INITIALISE (not necessarily CREATE) "dictlevelcount" just before scanning for the trailer. What if there's an error in the trailer and "dictlevecount" remains > 0? The rebuild is triggered, but for the rebuild it will start with the wrong value. However, this won't matter too much because it's expected to fail anyway (nobody fixes the trailer in the meantime!).
The patch from the comment #4 with "store" changed to "def" is committed as a rev. 8327. Although "store" may be more robust, it greatly extends the scope that the programmer has to consider reading the code. This is similar to use of globals in C instead of static variables. The rest of Ghostscript code uses "store" only when it's needed.