Bug 690455 - "C stack overflow" when extracting image
Summary: "C stack overflow" when extracting image
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Graphics Library (show other bugs)
Version: 8.64
Hardware: PC Windows XP
: P4 normal
Assignee: Henry Stiles
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-04-29 03:18 UTC by Barrie Cooper
Modified: 2010-09-10 16:38 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
Offending PDF File - 00010103.pdf (1.23 MB, application/pdf)
2009-04-29 03:20 UTC, Barrie Cooper
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Barrie Cooper 2009-04-29 03:18:54 UTC
I have successfully used GS to extract first page images for the majority of my 
9000+ PDF files.  A small percentage fail with the following error:

--------------------------------------------------------------------------------
gswin32c.exe -dBATCH -dMaxBitmap=300000000 -dNOPAUSE -dSAFER -sDEVICE=jpeg -
TextAlphaBits=4 -dGraphicsAlphaBit s=4 -dFirstPage=1 -dLastPage=1 -
OutputFile=00010103.jpg 00010103.pdf
GPL Ghostscript 8.64 (2009-02-03)
Copyright (C) 2009 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
*** C stack overflow. Quiting...
--------------------------------------------------------------------------------

Was hoping to attach "00010103.pdf" to this bug but I'm not sure if that is 
possible?
Comment 1 Barrie Cooper 2009-04-29 03:20:09 UTC
Created attachment 4980 [details]
Offending PDF File - 00010103.pdf
Comment 2 Ken Sharp 2009-04-29 06:03:45 UTC
I thought this might have been a colour problem, because I had a lot of these
types of problem when I reworked that area. However it actually appears to be a
JBIG2 decode problem. The PDF file is one where each page is a JBIG2 image, and
text has been (presumably OCR'ed) and laid on top with a text rendering mode
which draws nothing, resulting in apparently searchable text in an image document.

There seems to be some kind of recursion going on in the stream handling, which
goes out of control leading to the C stack overflow. I'm afraid I'm not familiar
enough with this to know say more. Its definitely a bug though.

My first thought was the strange DecodeParms :

/DecodeParms<</__pdfnet_jbig2 true>>

but this doesn't seem to be an issue, I tried removing them with no effect. Most
likely its some characteristic of the JBIG2 encoding which JasPer doesn't like.
FWIW the offending image is Im0, this is the first marking object on page 1...

A breakpoint on s_jbig2decode_process works pretty well.

Using the Luratech decoder instead of JasPer works as expected, so it does look
pretty much like this is a JasPer problem. Assigning to Ralph as the owner.
Comment 3 Alex Cherepanov 2009-04-29 07:07:27 UTC
The stack overflow bug is quite easy to fix.
The function jbig2_build_huffman_table() allocates 256K on the stack.
Ghostscript allocates 128K for the stack.

Changing jbig2_build_huffman_table() as following resolves the stack overflow.
Production quality code should, indeed, use Ghostscript heap instead of C
heap and free the block.

Jbig2HuffmanTable *
jbig2_build_huffman_table (Jbig2Ctx *ctx, const Jbig2HuffmanParams *params)
{
  int *LENCOUNT = malloc(1 << LOG_TABLE_SIZE_MAX);
  ...
} 

There is another issue with the file. Some of the characters are placed
at the wrong places.
Comment 4 Alex Cherepanov 2009-04-29 07:19:04 UTC
I forgot to multiply by sizeof(int)

  int *LENCOUNT = malloc((1 << LOG_TABLE_SIZE_MAX)*sizeof(int));

but this doesn't help with the misplaced characters.
Comment 5 Ralph Giles 2009-04-30 10:25:23 UTC
Alex's analysis is correct. In fact, the histogram only needs 256 elements.

This is fixed upstream. See
http://git.ghostscript.com/?p=jbig2dec;a=commitdiff;h=63e0436a711c59f7fae6cfd721b90428ae19a7b3
for the dynamic allocation fix, and
http://git.ghostscript.com/?p=jbig2dec;a=commitdiff;h=f1d00697525dd2d7a5f63f96e01ad0d99e673b13
for the size correction.

We still don't decode the file correctly, but this at least corrects the stack
overflow.
Comment 6 Hin-Tak Leung 2010-05-04 23:54:14 UTC
appear to be a jbig issue, hence assigning to Masaki.
Comment 7 Henry Stiles 2010-09-10 16:38:06 UTC
I am not seeing an issue with the current code (svn http://svn.ghostscript.com/ghostscript/trunk/gs).