Bug 703143 - segfault in ps2txt with certain large file
Summary: segfault in ps2txt with certain large file
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Text (show other bugs)
Version: master
Hardware: PC Linux
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-13 22:05 UTC by Jonas Smedegaard
Modified: 2021-01-30 11:06 UTC (History)
2 users (show)

See Also:
Customer:
Word Size: ---


Attachments
file which segfaults when trying to extract text contents with ps2txt (403.28 KB, application/gzip)
2020-11-13 22:05 UTC, Jonas Smedegaard
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jonas Smedegaard 2020-11-13 22:05:31 UTC
Created attachment 20186 [details]
file which segfaults when trying to extract text contents with ps2txt

It was reported at https://bugs.debian.org/970878 that ps2txt segfaults processing the following command:

zcat rfc1247.ps.gz | ps2txt > /dev/null
Comment 1 Peter Cherepanov 2020-11-14 15:23:13 UTC
The bug seems to be introduced in the commit 8c7bd787defa071c96289b7da9397f673fddb874 .
My "git bisect run" did not fully converge to this commit because of build errors on Linux, but it is the only relevant commit in the smallest identified interval.
Comment 2 Ray Johnston 2020-11-16 01:20:50 UTC
I'm not sure if it is the same as the segfault, but running with -Z@?$ shows
that there are MANY instances of rc_decrement where the ref_count goes from
0 to -1.

Debugging 'gs_text_release' calls 'rc_free_text_enum' which then calls (via
the pte->procs->release pointer) 'textw_text_release' which frees some of the
'textw_text_enum_t', but then calls (recursively) 'gs_text_release' even
though the ref_count is already 0. This call results in the error message:

GPL Ghostscript GIT PRERELEASE 95.40:
C:\Artifex\agit\ghostpdl\base\gstext.c(840): 0x19535f48b28 has ref_count of -1!

Without the -Z@?$ I confirm a segfault after page 112 with the call stack:

i_free_object(gs_memory_s * mem, void * ptr, const char * cname) Line 1507
rc_free_struct_only(gs_memory_s * mem, void * data, const char * cname) Line 287
gx_device_retain(gx_device_s * dev, int retained) Line 694
gx_show_text_release(gs_text_enum_s * pte, const char * cname) Line 1332
rc_free_text_enum(gs_memory_s * mem, void * obj, const char * cname) Line 833
gs_text_release(gs_gstate_s * pgs, gs_text_enum_s * pte, const char * cname) Line 840
textw_text_process(gs_text_enum_s * pte) Line 2352
gs_text_process(gs_text_enum_s * pte) Line 711

Assigning to Ken for further analysis.
Comment 3 Stefano Rivera 2020-12-01 02:40:35 UTC
> My "git bisect run" did not fully converge to this commit because of build errors on Linux, but it is the only relevant commit in the smallest identified interval.

I gave it a shot, and hit 278f9a53ed507f9109380ee4210fb860b35b1811

This line of it:
https://github.com/ArtifexSoftware/ghostpdl/commit/278f9a53ed507f9109380ee4210fb860b35b1811#diff-32b87e855551c33736870d37e59d086d4eade5d62a5871d721ffe7febc46edf3R2207

Program received signal SIGSEGV, Segmentation fault.
0x00005555559bc091 in gs_grestore (pgs=0x1) at ./base/gsstate.c:408
408	    if (!pgs->saved)
(gdb) bt
#0  0x00005555559bc091 in gs_grestore (pgs=0x1) at ./base/gsstate.c:408
#1  0x0000555555a40699 in gx_default_text_restore_state (pte=<optimized out>) at ./base/gxchar.c:252
#2  0x00005555558f6028 in textw_text_process (pte=0x5555585c9598) at ./devices/vector/gdevtxtw.c:2207
#3  0x0000555555af9fd0 in op_show_continue (i_ctx_p=0x555556764058) at ./psi/zchar.c:690
#4  op_show_continue (i_ctx_p=0x555556764058) at ./psi/zchar.c:685
#5  0x0000555555adadb5 in interp (perror_object=<optimized out>, pref=<optimized out>, pi_ctx_p=<optimized out>)
    at ./psi/interp.c:1300
#6  gs_call_interp (pi_ctx_p=pi_ctx_p@entry=0x555556730d20, pref=pref@entry=0x7fffffffd190, user_errors=user_errors@entry=1, 
    pexit_code=pexit_code@entry=0x7fffffffd21c, perror_object=<optimized out>) at ./psi/interp.c:520
#7  0x0000555555adc3d8 in gs_interpret (pi_ctx_p=pi_ctx_p@entry=0x555556730d20, pref=pref@entry=0x7fffffffd190, 
    user_errors=user_errors@entry=1, pexit_code=pexit_code@entry=0x7fffffffd21c, perror_object=<optimized out>, 
    perror_object@entry=0x7fffffffd220) at ./psi/interp.c:477
#8  0x0000555555acec7e in gs_main_interpret (perror_object=0x7fffffffd220, pexit_code=0x7fffffffd21c, user_errors=1, 
    pref=0x7fffffffd190, minst=<optimized out>) at ./psi/imain.c:257
#9  gs_main_run_string_end (minst=minst@entry=0x555556730c80, user_errors=user_errors@entry=1, 
    pexit_code=pexit_code@entry=0x7fffffffd21c, perror_object=perror_object@entry=0x7fffffffd220) at ./psi/imain.c:797
#10 0x0000555555aced11 in gs_main_run_string_with_length (perror_object=0x7fffffffd220, pexit_code=0x7fffffffd21c, 
    user_errors=1, length=40, str=0x555556811680 "<2f746d702f726663313234372e7073>.runfile", minst=0x555556730c80)
    at ./psi/imain.c:741
#11 gs_main_run_string_with_length (minst=0x555556730c80, str=0x555556811680 "<2f746d702f726663313234372e7073>.runfile", 
    length=40, user_errors=1, pexit_code=0x7fffffffd21c, perror_object=0x7fffffffd220) at ./psi/imain.c:727
#12 0x0000555555ad0cab in run_string (perror_object=0x7fffffffd220, pexit_code=0x7fffffffd21c, user_errors=1, options=3, 
    str=0x555556811680 "<2f746d702f726663313234372e7073>.runfile", minst=0x555556730c80) at ./psi/imainarg.c:1121
#13 runarg (minst=minst@entry=0x555556730c80, arg=arg@entry=0x7fffffffd318 "/tmp/rfc1247.ps", 
    post=post@entry=0x555555c3314e ".runfile", options=options@entry=3, user_errors=1, pexit_code=pexit_code@entry=0x0, 
    perror_object=0x0, pre=<optimized out>) at ./psi/imainarg.c:1090
#14 0x0000555555ad0fc4 in argproc (arg=0x7fffffffd318 "/tmp/rfc1247.ps", minst=0x555556730c80) at ./psi/imainarg.c:1012
#15 argproc (minst=0x555556730c80, arg=0x7fffffffd318 "/tmp/rfc1247.ps") at ./psi/imainarg.c:997
#16 0x0000555555ad2800 in gs_main_init_with_args01 (minst=minst@entry=0x555556730c80, argc=argc@entry=7, 
    argv=argv@entry=0x7fffffffde58) at ./psi/imainarg.c:242
#17 0x0000555555ad2aa9 in gs_main_init_with_args (minst=0x555556730c80, argc=argc@entry=7, argv=argv@entry=0x7fffffffde58)
    at ./psi/imainarg.c:289
#18 0x0000555555ad3efd in psapi_init_with_args (ctx=<optimized out>, argc=argc@entry=7, argv=argv@entry=0x7fffffffde58)
    at ./psi/psapi.c:272
#19 0x0000555555b38505 in gsapi_init_with_args (instance=<optimized out>, argc=argc@entry=7, argv=argv@entry=0x7fffffffde58)
    at ./psi/iapi.c:177
#20 0x0000555555665f0b in main (argc=7, argv=0x7fffffffde58) at ./psi/gs.c:95
(gdb) p pgs
$1 = (gs_gstate *) 0x1
(gdb) up
#1  0x0000555555a40699 in gx_default_text_restore_state (pte=<optimized out>) at ./base/gxchar.c:252
252	    return gs_grestore(pgs);
(gdb) up
#2  0x00005555558f6028 in textw_text_process (pte=0x5555585c9598) at ./devices/vector/gdevtxtw.c:2207
2207	        code = gx_default_text_restore_state(pte_fallback);
(gdb) p pte_fallback
$2 = (gs_text_enum_t *) 0x5555585c98b0
Comment 4 Ken Sharp 2021-01-30 11:06:47 UTC
I'm afraid that Git bisecting won't help in this case because this is not, exactly, a regression.

Commit d787dad3cd310788ea7201eb2fe1fff9a0a263c2 resolves the problem for me, although as noted in the commit message there are significant memory leaks which I'd like to get addressed before the next release (scheduled for March). There may be other problems revealed by Memento but the sheer number of leaks reported makes it essentially impossible to tell. I'll use a simpler file to reduce the leaks and then come back to this file when the number is under control.

The memory problems were long standing and the commits identified simply caused changes in the memory layout, which caused the underlying problems to emerge.