Bug 692252 - GS fails to read some PDF files
Summary: GS fails to read some PDF files
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Interpreter (show other bugs)
Version: 9.02
Hardware: PC Linux
: P4 normal
Assignee: Alex Cherepanov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-06-06 01:28 UTC by David
Modified: 2014-02-17 04:45 UTC (History)
3 users (show)

See Also:
Customer:
Word Size: ---


Attachments
The PDF file that causes the error (44.61 KB, application/pdf)
2011-06-06 01:28 UTC, David
Details
patch (514 bytes, patch)
2011-06-06 03:41 UTC, Alex Cherepanov
Details | Diff
bad operands (270.88 KB, application/pdf)
2011-06-06 16:18 UTC, Henry Stiles
Details
patch (3.00 KB, patch)
2011-06-11 14:38 UTC, Alex Cherepanov
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David 2011-06-06 01:28:31 UTC
Created attachment 7564 [details]
The PDF file that causes the error

I get this error when I run GhostScript on some PDF files.

kerplatz@kerplatz-laptop:~/Desktop/test/d20110411/56153$ gs 56153.stg4dqm.root_anl_leeds.pdf 
GPL Ghostscript 9.02 (2011-03-30)
Copyright (C) 2010 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
Loading NimbusSanL-Bold font from %rom%Resource/Font/NimbusSanL-Bold... 2547264 1216775 4055652 2758613 3 done.
   **** Unknown operator: 'inf'
Error: /typecheck in --run--
Operand stack:
   --dict:8/17(L)--   266.285   --nostringval--
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   1894   1   3   %oparray_pop   1893   1   3   %oparray_pop   1877   1   3   %oparray_pop   --nostringval--   --nostringval--   2   1   1   --nostringval--   %for_pos_int_continue   --nostringval--   --nostringval--   --nostringval--   --nostringval--   %array_continue   --nostringval--   false   1   %stopped_push   --nostringval--   %loop_continue   --nostringval--
Dictionary stack:
   --dict:1150/1684(ro)(G)--   --dict:1/20(G)--   --dict:82/200(L)--   --dict:82/200(L)--   --dict:108/127(ro)(G)--   --dict:295/300(ro)(G)--   --dict:23/30(L)--   --dict:6/8(L)--   --dict:22/40(L)--   --dict:7/15(L)--
Current allocation mode is local
Last OS error: 11
GPL Ghostscript 9.02: Unrecoverable error, exit code 1

I am including a PDF file that has this error. I have the latest Linux OS updates and GhostScript is also the latest. I am using ImageMagick to create JPG files from several PDFs, but as you can see I also get the same error when I run GS at the command line. By removing IM from the equation, I am betting it is not the problem.

I need this solved and am willing to help in solving the problem, but I have no idea on where to start. Thanks.
Comment 1 Alex Cherepanov 2011-06-06 02:06:05 UTC
The PDF file is invalid. It has the following string:

  inf 266.289 Td

Apparently, a floating point number was infinite and a PDF generator
wrote 'inf' instead of a number.
Comment 2 Alex Cherepanov 2011-06-06 03:41:13 UTC
Created attachment 7565 [details]
patch

The fix is on Ghostscript side is quite easy - just define inf as 0 and issue a warning.
Comment 3 Alex Cherepanov 2011-06-06 03:50:01 UTC
The patch has been committed as a rev. a0720527bcabb2732c3c06dfe3cae1f9c9ea9318
Comment 4 Ray Johnston 2011-06-06 05:30:12 UTC
I'm not sure that I am in favor of accepting such egregious violations of
the PDF spec and loading ghostscript withcruft to work around it, potentially 
slowing things down.

If this were an important customer or a common emitter of PDF that just
happened to "make a mistake" that's one thing, but in this case the /Info
shows lack of respect for PDF conventions:

Title:
Keywords: ROOT
Creator: ROOT Version 5.24/00
CreationDate: D:20110518193212

I'd recommend that the patch be reverted to force the creator to fix
their problem, and this be closed as WONTFIX
Comment 5 Henry Stiles 2011-06-06 16:01:00 UTC
Every other PDF implementation I've looked at displays the file.  The fix's implementation I'm sure is incorrect.  Inf should be an unknown operator as originally reported by gs, then TD should report incorrect number of arguments then continue.  What does mupdf do?
Comment 6 Henry Stiles 2011-06-06 16:18:16 UTC
Created attachment 7568 [details]
bad operands

Demonstrates fix doesn't cover failure to parse operator and continue - works on foxit and preview.  I guess it is debatable if we want the parser to continue - but I don't think we want the ad hoc fix committed.
Comment 7 James Cloos 2011-06-06 16:49:50 UTC
Root version 5.24/00 is a bit old; 5.28/00 is current with 5.30/00 in rc.

I’m sure they would like a bug report about this at: http://root.cern.ch/bugs

Mupdf displays it (including the text) with the notice:

  warning: unknown keyword: 'inf'

Ie, it seems to only drop the single q..Q section.

gs, OTOH, says that it flushed to EOJ.
Comment 8 Ray Johnston 2011-06-06 18:54:43 UTC
This "might" be able to be fixed more generally by having the parser (when
in PDF mode, which is already "special" as determined by PDFScanRules) 
substitute a 'null' for every 'undefined'. Then, of course, places that
expect numbers would have to substitute a 0 when a null is found.

I still think that WONTFIX or INVALID are reasonable resolutions (and
revert the single case patch for 'inf').
Comment 9 Hin-Tak Leung 2011-06-07 04:09:26 UTC
(In reply to comment #2)
> Created an attachment (id=7565) [details]
> patch
> 
> The fix is on Ghostscript side is quite easy - just define inf as 0 and issue a
> warning.

I am not going to get into whether this should be dealt with in gs or in root, but shouldn't inf be substituted with <a_very_large_number> instead of zero, if it should be substituted at all?
Comment 10 David 2011-06-07 06:31:43 UTC
Guys,

So I have a couple of questions and I hope the work around for this problem works.

1. What does comment 1 mean? What number was an infinite floating point? I did not create the PDF files, but I sure can find out what was used to create the PDF file.

2. All PDF viewers display the file without any errors. Why would GS care if the PDF viewer does not?

3. Should I fix the reason that is causing the infinite floating point, or can I just use the GS patch mentioned in this thread? These PDF's are only used internally to the Dept I work for, so I just want a quick solution. The cause can be fixed later.

Please tell me if I need to supply you guys with more information. Thanks.
Comment 11 James Cloos 2011-06-07 16:29:04 UTC
There is a large stream in that pdf (in object 49) which prints the text in the coloured circles, one digit at a time.

It uses this idiom to print each digit:

 q 0 0 566.929 532.57 re W n BT /F10 8.58983 Tf 280.601 263.422 Td (0) Tj ET Q

One of those q..Q sections looks like this:

 q 0 0 566.929 532.57 re W n BT /F6 24.3379 Tf inf 266.285 Td () Tj ET Q

The string “inf” in that line was most likely generated by printf(3) from an infinite floating point value.  But the PDF language does not support inf or nan for infinite or not-a-number floats; hense the error.

If a PDF viewer skips forward after the error only to the next Q in the stream, then this particular PDF will render is fine.  If it skips to the end of the stream, though, then 

Incidently, the two q..Q sections immediately before that buggy one also print empty strings (the parentheses are the string delimiters); no other empty strings exist in that stream.
Comment 12 David 2011-06-07 19:30:22 UTC
Hi,

How are you guys seeing this data? When I use a text editor or a Hex editor, I don't see what you see. I assume it must be encoded in the PDF. I really would like to view this data, and if you could tell me how to do it, I would appreciate it.

Also, PDF generator is not at fault. I know this because what you are seeing in the PDF is the output of about 500 light sensors that can have a value of 0 to infinity, which is a valid output of the light sensor.

Hope that helps solve the problem.
Comment 13 Henry Stiles 2011-06-07 19:57:22 UTC
> If a PDF viewer skips forward after the error only to the next Q in the stream,
> then this particular PDF will render is fine.  If it skips to the end of the
> stream, though, then 
> 

Yes, but it is fairly apparent the adobe scanner is not doing this, to see this add a line or some graphic to before the closing Q in the bad section - and it will print (you'll have to adjust the stream length).  If we come as close as possible to the adobe scanner in behavior we may avoid an entire class of future reports.
Comment 14 Henry Stiles 2011-06-07 20:11:23 UTC
(In reply to comment #12)
> Hi,
> 
> How are you guys seeing this data? When I use a text editor or a Hex editor, I
> don't see what you see. I assume it must be encoded in the PDF. I really would
> like to view this data, and if you could tell me how to do it, I would
> appreciate it.
> 

We uncompressed the file you can google about that.

> Also, PDF generator is not at fault. I know this because what you are seeing in
> the PDF is the output of about 500 light sensors that can have a value of 0 to
> infinity, which is a valid output of the light sensor.
>


The PDF is wrong, the string "inf" is not a legal token for a number - that is not in question.  Adobe Acrobat displays its usual broken pdf message: "error exists ... report it to the creator" upon opening the file.  The question is how to parse PDF in the presence of the parse error.  We definitely want to flag the error but we are going back and forth about what to do after the error.
Comment 15 James Cloos 2011-06-07 20:48:11 UTC
My current favourite app for uncompressing pdfs is qpdf from <http://qpdf.sourceforge.net/>.  (Run with its --qdf flag.)

I then used the standard unix tools grep(1), tr(1) and less(1) to find the inf.
(The stream is a single *very* long line, so I used » tr ' ' '\n' « to convert the spaces to newlines so that each token would be on its own line, making it easier to read in less(1).  A text editor would work just as well for that.

pdftk <http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/> and podofo <http://podofo.sourceforge.net> can also uncompress a pdf file.

Mupdf has pdfshow(1) which can extract and uncompress a specified object.

After uncompressing, given that the error was »Unknown operator: 'inf'«,
it is just a matter of searching for the string inf.  Looking at the other q..Q sections near the one with inf makes it clear – even if you do not know what the Td operator does – that it should have been a number.
Comment 16 Alex Cherepanov 2011-06-11 14:38:44 UTC
Created attachment 7589 [details]
patch

Run PDF operator streams in a stopped context. Stop processing of the
stream on error, but continue to process rest of the file. Remove a
hack that defined 'inf' as 0.
Comment 17 Alex Cherepanov 2011-06-11 14:41:02 UTC
The patch has been committed as a rev. 4c6809dfa1c539d757c30f572922e05cd1436698
Comment 18 David 2011-06-14 19:55:31 UTC
Sorry to bother you guys again, but I can't seem to get a copy of GS with the patch that was alluded to in the previous post. Would please tell me how do I get the updated software? I am using Ubuntu 10.4. Or do I have to apply the patch manually, and if so, how is that done? Thanks for all your help.
Comment 19 Alex Cherepanov 2011-06-15 02:32:43 UTC
You can get the current development version from our git repository
http://git.ghostscript.com/?p=ghostpdl.git;a=summary
and build the project in gs directory.

The patch attached as a comment #2 works better for your file.
My colleagues had objections based on the development principles of Ghostscript
project, but the patch is smaller and easier to apply manually. You can apply it
to your installed copy.

If your distribution compiles resources into the executable,
you need to provide to gs a full path to the modified files as
-I/SOME/PATH/Resource/Init
Comment 20 David 2011-06-19 01:49:47 UTC
Alex,
It is still unclear to me what I have to do to apply the patch. Are there instructions on how to apply the patch? I will read and follow the instructions if I knew where they are. Please understand that I know what I am doing, it is just that you did not make it clear what the procedure is to apply the patch. Thanks.
Comment 21 Alex Cherepanov 2011-06-21 14:01:47 UTC
1. Open Resource/Init/pdf_draw.ps in any text editor
2. Add the following line after /Tform definition
 /inf { 0 } def 
3. Provide the path to your Init directory with modified files
 by -I option.