Bug 692684 - SEGV in 33_all.PS
Summary: SEGV in 33_all.PS
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PS Interpreter (show other bugs)
Version: master
Hardware: PC Linux
: P4 normal
Assignee: Alex Cherepanov
URL:
Keywords:
: 692707 (view as bug list)
Depends on:
Blocks:
 
Reported: 2011-11-15 04:08 UTC by Alex Cherepanov
Modified: 2012-07-17 05:53 UTC (History)
3 users (show)

See Also:
Customer:
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Cherepanov 2011-11-15 04:08:58 UTC
Cluster runs often report SEGV in 33_all.PS

I've reproduced the problem locally in the release build. (-O2)
~/ghostpdl/gs/debugbin/gs -sOutputFile=/dev/null  -sDEVICE=pkmraw   -r300 -sDEFAULTPAPERSIZE=letter -dNOPAUSE -dBATCH -dClusterJob -dJOBSERVER %rom%Resource/Init/gs_cet.ps - <  ~/bug/a/33_all.PS

The following patch fixes the problem for me but seems to have little effect
on the cluster.

--- a/gs/psi/imain.c
+++ b/gs/psi/imain.c
@@ -242,7 +242,7 @@ gs_main_interpret(gs_main_instance *minst, ref * pref, int user_errors,
 }
 
 int
-gs_main_init2(gs_main_instance * minst)
+gs_main_init2(gs_main_instance * volatile minst)
 {
     i_ctx_t *i_ctx_p;
     int code = gs_main_init1(minst);
Comment 1 Alex Cherepanov 2011-11-15 17:08:46 UTC
Marcos, please figure out when did this SEGV start.
Comment 2 Marcos H. Woehrmann 2011-11-27 03:08:42 UTC
I can't find any instance of this conversion ever having failed on the cluster nor can I get the command given in the Description to fail on any of the cluster machines that I tried.

While searching through cluster results archive I did find a possibly related problem, on the macpro converting 33_all.PS to a PDF or PostScript file seg faults, I've entered a new bug for this issue: Bug 692707.
Comment 3 Alex Cherepanov 2011-11-27 20:38:35 UTC
Divide gs_main_init2() into two functions to work around a bug in
gcc 4.5.1 with -O2 option. The bug caused gcc to drop 2nd
assignment to i_ctx_p and resulted in SEGV error later on.

See:
http://git.ghostscript.com/?p=ghostpdl.git;a=commit;h=a39e4831ba0d74d742b365f3b3b1af192731303c
Comment 4 Marcos H. Woehrmann 2011-12-08 22:38:04 UTC
Alex, please give me access to whatever machine this fails on for testing.  I've tried compiling gs with gcc 4.5.2 and it does not fail.
Comment 5 Alex Cherepanov 2012-07-12 03:23:36 UTC
Move the GC flag from the stack, where it can go out of scope to
lib context structure (gs_lib_ctx_t), which is allocated quite
early in stable memory and never goes out of scope.

Revert the commit a39e4831ba0d74d742b365f3b3b1af192731303c for the
bug 692684 because it didn't really fix anything. That patch just
changed the stack layout and masked the effect of writing into an
out-of-scope location.

The sample file 34_all.PS no longer causes SEGV but continues to 
have an resolution-dependant rendering of one character.

A proper fix for *_all.PS files has been committed as:

http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=39b0a515d5101cf48ae35f1ebb14e4558cb78e88
Comment 6 Ray Johnston 2012-07-12 15:40:45 UTC
The patch has been reverted because it caused a 2x slowdown in the
regression runs
Comment 7 Chris Liddell (chrisl) 2012-07-13 13:52:51 UTC
As per Henry's request for review/consideration, here's my proposed fix for the stack corruption problem:

http://git.ghostscript.com/?p=user/chrisl/ghostpdl.git;a=commitdiff;h=c85ab948


That fixes the stack variable corruption that I could see, and any performance impact is below the noise level on cluster's data gathering.
Comment 8 Alex Cherepanov 2012-07-13 14:52:36 UTC
The patch looks fine to me.
Perhaps, the whole gc_status structure can be backed up to keep
the tick count.
Comment 9 Chris Liddell (chrisl) 2012-07-13 15:02:00 UTC
I don't *think* we want the entire gc_status structure because it contains parameters that *theoretically* could vary between instances of the memory space (although I don't think they do just now).

We *might* want to also restore the signal_value entry, but I felt that we'd want to leave that so (I think!) we pick up the gc "pattern" for the restored space exactly where it left off when it went "out of scope" in the save.
Comment 10 Ray Johnston 2012-07-13 21:47:00 UTC
I don't follow some of the comments about 'ticks_left'. This variable is
within the 'interp' function context and is used (abused) for two purposes:
the interpreter loop uses it to decide when to 'slice' to a different context
(display postscript extension). It is also used by the gs_memory_ref_t
allocators to signal the need for a GC which is detected when ticks_left
is <= -100, after which ticks_left is set for a full slice of passes through
the interp loop.

To me, the only thing we may want to do upon a 'restore' is "unsignal"
the value, but the only information we have is what value to put at psignal
when we want to request a GC -- not what value when we want to cancel the
request. Note that it's unlikely that a GC would be requested during a
restore operation (it would have to cause an allocator to exceed the
VMthreshold), and the only harm is we would be running a GC after a restore
with _some_ of the allocations gone anyway as a result of the restore.

As far as Chris's patch, I agree that if we did a 'save', then exited the
interp function (such as with e_NeedInput or e_RemapColor), then re-entered
the interp and restore to a different mem, the psignal will be stale and
'ticks_left' may be in a different place. After that requesting a GC will
indeed clobber the stack. I think the patch as-is is OK, but...

Moving the ticks_left variable off the stack and into somewhere like 'minst'
would give us a stable address and (AFAICT) obviate the need for Chris's
patch. For speed in the interp loop, we could use "int *pticks_left" that
gets set upon entry so that we don't have to get 'minst->ticks_left'. The
only reason for the 'overlap' of function of ticks_left with GC signalling
is too avoid checking BOTH ticks_left (for slice) and a GC signal, otherwise
we could probably have ticks_left on the stack and the GC signal in minst
(a pseudo global for the interpreter instance).

The signal would still work the same way -- setting *psignal = signal_value

So, Chris's patch is oK with me, but I'd appreciate any comments (here or
in email) about my follow up suggestion (like, did I miss something?) and
whether we do something like I suggested or not will be left up to the
consensus.
Comment 11 Chris Liddell (chrisl) 2012-07-13 22:03:44 UTC
Personally, I would like to apply the least invasive patch that actually fixes the problem, given that we're only about two weeks from a scheduled release.

I'd much rather make the current code work as intended, than move stuff around, and find out after the released that there is some subtlety that we've missed.

There's clearly more required than just moving the pointer to be a pseudo global in the context, since that's essentially what Alex's patch did, and it had the unexpected impact on performance.

As I discussed with Robin, it looks like we (at least) also need a way to identify garbage collected spaces from non-gc spaces (currently psignal != NULL indicates a gc'ed space).
Comment 12 Chris Liddell (chrisl) 2012-07-14 07:36:00 UTC
Seg fault fixed in:

http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=808f39dd

I created an enhancement to consider Ray's and other proposals for tidying this up:
http://bugs.ghostscript.com/show_bug.cgi?id=693179
Comment 13 Alex Cherepanov 2012-07-17 05:53:49 UTC
*** Bug 692707 has been marked as a duplicate of this bug. ***