Bug 689917

Summary: Segmentation fault processing incomplete CCITTFax stream
Product: Ghostscript Reporter: Tim Waugh <twaugh>
Component: PDF InterpreterAssignee: Alex Cherepanov <alex>
Status: RESOLVED FIXED QA Contact: Bug traffic <tech>
Severity: normal    
Priority: P4 CC: christinedelight.top85, smcc
Version: 8.62   
Hardware: PC   
OS: Linux   
Customer: Word Size: ---
Attachments: gs-scfd.patch
689917.pdf -- local copy of the sample file.
A patch for indeterminism in 23-32.ps

Description Tim Waugh 2008-06-23 06:54:40 UTC
The PDF document found here:
crashes ghostscript-8.62, and HEAD, like this:

GPL Ghostscript 8.62 (2008-02-29)
Copyright (C) 2008 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
   **** Warning: stream operator not terminated by valid EOL.
   **** Warning: stream operator not terminated by valid EOL.

   **** Warning: File has insufficient data for an image.
Segmentation fault (core dumped)

I have tracked this down to a buffer underrun in src/scfd.c.  A single bit of
the obj_header_t structure for the ss->lbuf object gets inverted.
Comment 1 Tim Waugh 2008-06-23 06:55:28 UTC
Created attachment 4143 [details]

This patch works around the problem and shows where the buffer underrun occurs.
 I have not discovered the real cause of the bug.
Comment 2 Ray Johnston 2008-07-10 10:11:22 UTC
Assigning to Ralph as the current owner of filter issues.
Comment 3 Alex Cherepanov 2008-07-28 17:42:53 UTC
Created attachment 4245 [details]
689917.pdf  -- local copy of the sample file.
Comment 4 Alex Cherepanov 2008-07-28 17:44:22 UTC
*** Bug 688075 has been marked as a duplicate of this bug. ***
Comment 5 Ralph Giles 2008-07-28 21:45:27 UTC
I've applied the patch as a work around for the 8.63 release. Thanks Tim.

I suspect the problem is invalid data; libpopper shows the image but there are
some corrupt lines about halfway down the page. With the patch we just show a
blank page. Attempting to resync and continue decoding would be better.
Comment 6 Ralph Giles 2008-07-28 23:50:17 UTC
Alex, the change the cluster regression reports in 23-32.PS is on the second
page. The limitcheck error from the test:

 /f { (123) } 0 () /SubFileDecode filter def
 f token

Prints a different pointer from the stack, --@0x497bf0-- instead of
--@0x497bc0--. This affects the checksum as well.

I don't see how this is related. What is the pointer value referring to?
Comment 7 Ralph Giles 2008-07-29 01:02:28 UTC
The hex value represents an operator. Ken suggests this might be static data,
since he's seen similar variations in his testing based on how the code is compiled.

I'd still like to understand it though, both because it's an interpreter change
in freeze and because it seems to be a common instability in the regression suite.
Comment 8 Alex Cherepanov 2008-07-29 05:57:24 UTC
When an operator has no name gs prints its address.
This address changes sometimes when C code is modified.
We can fix this bug by giving a name to the operator.
This bug is known from the beginning of CET testing.
Comment 9 Alex Cherepanov 2008-07-29 11:01:50 UTC
Created attachment 4249 [details]
A patch for indeterminism in 23-32.ps

Give a name to a continuation operator that was printed by 23-32.ps test
file as an address, which changed between builds. Reduce the number of false
positives in regression testing.
Comment 10 Alex Cherepanov 2009-01-18 12:52:29 UTC
The patch from comment #9 (for intdeterminism in 23-32.ps) was committed 
as rev. 8927 on 2008-08-03.
Comment 11 Tim Waugh 2009-04-02 07:34:54 UTC
Created attachment 4884 [details]

I wonder if this is the real fix.  At this point, rows_left is -1, yet we still
carry on.

The reason we were writing outside the buffer was that wpos was -1 when
invert_data() was invoked.

I can't help thinking the root cause is a missing or incomplete validity check
somewhere, and perhaps this is it.  Any ideas?
Comment 12 Tim Waugh 2009-04-02 10:51:58 UTC
Created attachment 4886 [details]

Here is a different patch to make the invert_data macro more robust.  The idea
is that when qbit==0 we don't want to change any bits in the current byte; in
fact q might be pointing to the byte immediately before the buffer.

This patch advances q in that situation, and resets qbit to 8.

Comparing with skip_data this seems to be what is needed to avoid accessing
memory outside the buffer.

This gives a mostly white page with the sample input file, but the diagnostic
messages seem to indicate other problems with that file anyway so perhaps that
is expected.
Comment 13 Hin-Tak Leung 2010-05-02 03:12:49 UTC
Grabbing a Ralph's bugs.
Comment 14 Hin-Tak Leung 2010-06-29 21:54:26 UTC
The file no longer cause segfault ; gs svn head shows half a page texts+graphics, acroread linux 9 shows none, and xpdf and evince shows some visual artifects around the point where gs stops, and some warning/error messages, but show a full page of text+graphics.
Comment 15 Hin-Tak Leung 2010-08-02 00:32:20 UTC
Re-assigning bugs which still have work to do.
Comment 16 Alex Cherepanov 2013-01-17 06:36:50 UTC
Ghostscript no longer crashes on the sample file.