Cluster runs often report SEGV in 33_all.PS I've reproduced the problem locally in the release build. (-O2) ~/ghostpdl/gs/debugbin/gs -sOutputFile=/dev/null -sDEVICE=pkmraw -r300 -sDEFAULTPAPERSIZE=letter -dNOPAUSE -dBATCH -dClusterJob -dJOBSERVER %rom%Resource/Init/gs_cet.ps - < ~/bug/a/33_all.PS The following patch fixes the problem for me but seems to have little effect on the cluster. --- a/gs/psi/imain.c +++ b/gs/psi/imain.c @@ -242,7 +242,7 @@ gs_main_interpret(gs_main_instance *minst, ref * pref, int user_errors, } int -gs_main_init2(gs_main_instance * minst) +gs_main_init2(gs_main_instance * volatile minst) { i_ctx_t *i_ctx_p; int code = gs_main_init1(minst);
Marcos, please figure out when did this SEGV start.
I can't find any instance of this conversion ever having failed on the cluster nor can I get the command given in the Description to fail on any of the cluster machines that I tried. While searching through cluster results archive I did find a possibly related problem, on the macpro converting 33_all.PS to a PDF or PostScript file seg faults, I've entered a new bug for this issue: Bug 692707.
Divide gs_main_init2() into two functions to work around a bug in gcc 4.5.1 with -O2 option. The bug caused gcc to drop 2nd assignment to i_ctx_p and resulted in SEGV error later on. See: http://git.ghostscript.com/?p=ghostpdl.git;a=commit;h=a39e4831ba0d74d742b365f3b3b1af192731303c
Alex, please give me access to whatever machine this fails on for testing. I've tried compiling gs with gcc 4.5.2 and it does not fail.
Move the GC flag from the stack, where it can go out of scope to lib context structure (gs_lib_ctx_t), which is allocated quite early in stable memory and never goes out of scope. Revert the commit a39e4831ba0d74d742b365f3b3b1af192731303c for the bug 692684 because it didn't really fix anything. That patch just changed the stack layout and masked the effect of writing into an out-of-scope location. The sample file 34_all.PS no longer causes SEGV but continues to have an resolution-dependant rendering of one character. A proper fix for *_all.PS files has been committed as: http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=39b0a515d5101cf48ae35f1ebb14e4558cb78e88
The patch has been reverted because it caused a 2x slowdown in the regression runs
As per Henry's request for review/consideration, here's my proposed fix for the stack corruption problem: http://git.ghostscript.com/?p=user/chrisl/ghostpdl.git;a=commitdiff;h=c85ab948 That fixes the stack variable corruption that I could see, and any performance impact is below the noise level on cluster's data gathering.
The patch looks fine to me. Perhaps, the whole gc_status structure can be backed up to keep the tick count.
I don't *think* we want the entire gc_status structure because it contains parameters that *theoretically* could vary between instances of the memory space (although I don't think they do just now). We *might* want to also restore the signal_value entry, but I felt that we'd want to leave that so (I think!) we pick up the gc "pattern" for the restored space exactly where it left off when it went "out of scope" in the save.
I don't follow some of the comments about 'ticks_left'. This variable is within the 'interp' function context and is used (abused) for two purposes: the interpreter loop uses it to decide when to 'slice' to a different context (display postscript extension). It is also used by the gs_memory_ref_t allocators to signal the need for a GC which is detected when ticks_left is <= -100, after which ticks_left is set for a full slice of passes through the interp loop. To me, the only thing we may want to do upon a 'restore' is "unsignal" the value, but the only information we have is what value to put at psignal when we want to request a GC -- not what value when we want to cancel the request. Note that it's unlikely that a GC would be requested during a restore operation (it would have to cause an allocator to exceed the VMthreshold), and the only harm is we would be running a GC after a restore with _some_ of the allocations gone anyway as a result of the restore. As far as Chris's patch, I agree that if we did a 'save', then exited the interp function (such as with e_NeedInput or e_RemapColor), then re-entered the interp and restore to a different mem, the psignal will be stale and 'ticks_left' may be in a different place. After that requesting a GC will indeed clobber the stack. I think the patch as-is is OK, but... Moving the ticks_left variable off the stack and into somewhere like 'minst' would give us a stable address and (AFAICT) obviate the need for Chris's patch. For speed in the interp loop, we could use "int *pticks_left" that gets set upon entry so that we don't have to get 'minst->ticks_left'. The only reason for the 'overlap' of function of ticks_left with GC signalling is too avoid checking BOTH ticks_left (for slice) and a GC signal, otherwise we could probably have ticks_left on the stack and the GC signal in minst (a pseudo global for the interpreter instance). The signal would still work the same way -- setting *psignal = signal_value So, Chris's patch is oK with me, but I'd appreciate any comments (here or in email) about my follow up suggestion (like, did I miss something?) and whether we do something like I suggested or not will be left up to the consensus.
Personally, I would like to apply the least invasive patch that actually fixes the problem, given that we're only about two weeks from a scheduled release. I'd much rather make the current code work as intended, than move stuff around, and find out after the released that there is some subtlety that we've missed. There's clearly more required than just moving the pointer to be a pseudo global in the context, since that's essentially what Alex's patch did, and it had the unexpected impact on performance. As I discussed with Robin, it looks like we (at least) also need a way to identify garbage collected spaces from non-gc spaces (currently psignal != NULL indicates a gc'ed space).
Seg fault fixed in: http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=808f39dd I created an enhancement to consider Ray's and other proposals for tidying this up: http://bugs.ghostscript.com/show_bug.cgi?id=693179
*** Bug 692707 has been marked as a duplicate of this bug. ***