Bug 691503 - Regression: ghostscript produces bad pdf file starting with r8445
Summary: Regression: ghostscript produces bad pdf file starting with r8445
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: master
Hardware: PC All
: P2 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-26 21:20 UTC by Marcos H. Woehrmann
Modified: 2010-08-03 08:04 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
patch part 1 (3.26 KB, application/octet-stream)
2010-07-29 11:59 UTC, Ken Sharp
Details
patch part 2 (1.61 KB, application/octet-stream)
2010-07-29 12:00 UTC, Ken Sharp
Details
proposed PDF interpreter patch (584 bytes, application/octet-stream)
2010-07-29 13:56 UTC, Ken Sharp
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2010-07-26 21:20:55 UTC
Starting with r8445 Ghostscript produces a PDF file that cannot be read by Ghostscript head (r11541).

The command lines I'm using:

  bin/gs -sDEVICE=pdfwrite -o test.pdf -dLastPage=1 ./Bug691442.pdf
  bin/gs -sDEVICE=ppmraw -o test.ppm ./test.pdf
Comment 2 Marcos H. Woehrmann 2010-07-27 15:51:36 UTC
I've verified that this fails on peeves:

marcos@peeves:[16]% artifex/head/bin/gs -sDEVICE=pdfwrite -o test.pdf -dLastPage=1 ./Bug691442.pdf
GPL Ghostscript 9.00 (2010-07-31)
Copyright (C) 2010 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
GPL Ghostscript 9.00: ERROR: A pdfmark destination page 2 points beyond the last page 1.

marcos@peeves:[17]% artifex/head/bin/gs -sDEVICE=ppmraw -o test.ppm ./test.pdf
GPL Ghostscript 9.00 (2010-07-31)
Copyright (C) 2010 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
   **** Warning:  File has an invalid xref entry:  7.  Rebuilding xref table.
   **** Warning:  There are objects with matching object and generation
   **** numbers.  The accuracy of the resulting image is unknown.
Processing pages 1 through 1.
Page 1
   **** Unrecoverable error in xref!
Error: /rangecheck in resolveR
Operand stack:
   --dict:8/17(L)--   --dict:10/18(L)--   637   --nostringval--   --nostringval--   637   12   0
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   1894   1   3   %oparray_pop   1893   1   3   %oparray_pop   1877   1   3   %oparray_pop   --nostringval--   --nostringval--   2   1   1   --nostringval--   %for_pos_int_continue   --nostringval--   --nostringval--   --nostringval--   --nostringval--   %array_continue   --nostringval--   false   1   %stopped_push   --nostringval--   %loop_continue   --nostringval--   637   --nostringval--   --nostringval--   --nostringval--   --nostringval--   --nostringval--   --nostringval--
Dictionary stack:
   --dict:1151/1684(ro)(G)--   --dict:1/20(G)--   --dict:82/200(L)--   --dict:82/200(L)--   --dict:108/127(ro)(G)--   --dict:290/300(ro)(G)--   --dict:24/25(L)--   --dict:6/8(L)--   --dict:22/40(L)--   --dict:1/1(ro)(G)--   --dict:1/1(ro)(G)--   --dict:1/1(ro)(G)--   --dict:5/16(L)--
Current allocation mode is local
Last OS error: 2
GPL Ghostscript 9.00: Unrecoverable error, exit code 1

marcos@peeves:[18]%
Comment 3 Ken Sharp 2010-07-29 11:59:00 UTC
Created attachment 6574 [details]
patch part 1
Comment 4 Ken Sharp 2010-07-29 12:00:48 UTC
Created attachment 6575 [details]
patch part 2

The problem is not entirely fixable in pdfwrite. We receive a /OUT pdfmark generated from the PDF interpreter, this references both page 1 and page 2. pdfwrite duly adds these to the outline tree and writes /DEST entries to the PDF file.

In order to do this it creates a reference to the second page, which reserves an object number for it. However, we never get a second page, which means we end up with a reserved entry pointing at an invalid offset in the PDF file. This seems to be what the Ghostscript PDF interpreter is complaining about. 

Now the right way to deal with this is not to emit the outline for pages that don't exist, but there's no way for pdfwrite to know that page 2 won't be turning up. So the real fix needs to be in the PDF interpreter which should validate the destinations against the number of pages in the PDF file, and the first/last pages requested, to ensure that destinations only point at real pages.

I have made a change in the way that pdfwrite works. It now stores pages initially with an offset of 0 in the xref (which is never legal because of the PDF header). When writing the xref table, we now check to see if any entries have an offset of 0 and if they do we write the xref in sections, eliding the object(s) whose offset is 0.

The resulting PDF file can be opened in Ghostscript again. However its not ideal because we are still setting an Outline which uses a Dest which is pointing to a page and object which does not exist.

The attached patches implement this functionality. I'll probably fold this into the pdfwrite code assuming that a cluster test shows no problems. I'll then look at fixing the PDF interpreter so it doesn't emit broken pdfmarks either.
Comment 5 Ken Sharp 2010-07-29 13:56:05 UTC
Created attachment 6576 [details]
proposed PDF interpreter patch

The attached patch contains a proposed patch to the PDF interpreter to better resolve this problem.

After finding the page number of the /Dest for the outline, check to see if FirstPage or LastPage is defined, and check the page number to ensure it is between these values. If it isn't then return a rangecheck error. The error is picked up in the caller, and a warning emitted that a link was omitted.

This prevents creating outline entries pointing to pages that won't exist after PDF creation.

Assigning to Alex to review the PDF interpreter patch, I will commit the pdfwrite patches.
Comment 6 Alex Cherepanov 2010-08-03 02:20:13 UTC
Fine for me, please commit.
Comment 7 Ken Sharp 2010-08-03 08:04:42 UTC
Fixed in pdfwrite with revision 11550, patch here :

http://ghostscript.com/pipermail/gs-cvs/2010-July/011512.html

and in the PDF interpreter with revision 11595, patch here :

http://ghostscript.com/pipermail/gs-cvs/2010-August/011569.html

Either of these is sufficient to resolve the problem. The fix to the PDF
interpreter is the better solution, the fix to pdfwrite has more general
application and is included mainly for future use.