Bug 696540

Summary: ps2pdf fails on a file that can be opened by some other viewers
Product: Ghostscript Reporter: Tomasz Kuchta <t.kuchta>
Component: PDF InterpreterAssignee: Ken Sharp <ken.sharp>
Status: RESOLVED FIXED    
Severity: major    
Priority: P4    
Version: 9.16   
Hardware: PC   
OS: Linux   
Customer: Word Size: ---
Attachments: 561834.pdf

Description Tomasz Kuchta 2016-01-25 06:48:20 UTC
ps2pdf fails on a document that can be opened e.g. by evince.

The file is "561834.pdf", from Govdocs1 data set (http://digitalcorpora.org/corpora/govdocs)

The file can be found in the following archive: http://digitalcorpora.org/corp/files/govdocs1/zipfiles/561.zip

The program output is 
   **** Error: invalid token after startxref.
   **** Warning:  An error occurred while reading an XREF table.
   **** The file has been damaged.  This may have been caused
   **** by a problem while converting or transfering the file.
   **** Ghostscript will attempt to recover the data.
Error: /invalidaccess in --run--
Operand stack:
   post_eof_count   4096   --nostringval--   65034
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   1967   1   3   %oparray_pop   1966   1   3   %oparray_pop   1950   1   3   %oparray_pop   --nostringval--   --nostringval--   --nostringval--   --nostringval--   --nostringval--   --nostringval--   --nostringval--
Dictionary stack:
   --dict:1191/1684(ro)(G)--   --dict:1/20(G)--   --dict:83/200(L)--   --dict:83/200(L)--   --dict:117/127(ro)(G)--   --dict:280/300(ro)(G)--   --dict:21/32(L)--
Current allocation mode is local
GPL Ghostscript 9.16: Unrecoverable error, exit code 1

That was tested on Ubuntu 15-10

I would be grateful if you could confirm the problem.

Kind regards,
Tomasz
Comment 1 Marcos H. Woehrmann 2016-01-25 06:56:25 UTC
Created attachment 12262 [details]
561834.pdf
Comment 2 Marcos H. Woehrmann 2016-01-25 07:00:34 UTC
I've confirmed that Ghostscript produces an error reading the PDF file and that the other PDF viewers I tried (Acrobat Pro DC, Apple Preview 8.1, and muPDF) open the file without error (muPDF produces a warning(.
Comment 3 Tomasz Kuchta 2016-01-25 07:02:30 UTC
Thanks a lot Marcos for checking that
Comment 4 Ken Sharp 2016-01-25 09:27:39 UTC
OK firstly please try and test with the current code where possible, the current release is 9.18

Please *attach* files, don't post a URL, it happens quite frequently that URLs go stale before anyone has a chance to examine the problem.

Finally, you have bot supplied the command line you are using, we'll need that in order to reproduce any problem.
Comment 5 Tomasz Kuchta 2016-01-25 12:10:21 UTC
(In reply to Ken Sharp from comment #4)

I'm sorry: the command line would be without any parameters
pd2pdf 561834.pdf

Unfortunately I was able to only check on the version available on Ubuntu 15-10. 
I will update the report if I also try the newest. Thank you.

> OK firstly please try and test with the current code where possible, the
> current release is 9.18
> 
> Please *attach* files, don't post a URL, it happens quite frequently that
> URLs go stale before anyone has a chance to examine the problem.
> 
> Finally, you have bot supplied the command line you are using, we'll need
> that in order to reproduce any problem.
Comment 6 Tomasz Kuchta 2016-01-31 16:06:13 UTC
Hello, I've checked the git version of GS and it seems that the file also fails. For some reason the build of the git version shows 9.10 (though gs --version gives 9.19).
Comment 7 Ken Sharp 2016-02-08 06:04:25 UTC
Fixed in commit 119e73617fb0f1b20e6d3257d26df0159c4ca81a


The file is, as usual, damaged. It ends with 'startxref 1102random binary' when it should have a sensible number and a %EOF

The fact that the startxref is present confused our error correcting code and
ended up closing the underlying file, which caused ioerrors.

Fixing that revealed a different problem to do with calculating the offset of the 'real' trailer dictionary when it lies early in the PDF file. This is common with Linearised (adobe calls this 'optimised for fast web view') and very uncommon otherwise.

This commit fixes both problems, and Ghostscript is able to repair the file.

NB when you open the file with Adobe Acrobat, and then exit it offers to 'save the changes'. Assuming you haven't made any changes this is a good indication that Acrobat has silently fixed a problem in the file.