Summary: | segfault in ps2txt with certain large file | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Jonas Smedegaard <dr> |
Component: | Text | Assignee: | Ken Sharp <ken.sharp> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | ghostscript, sphinx.pinastri |
Priority: | P4 | ||
Version: | master | ||
Hardware: | PC | ||
OS: | Linux | ||
Customer: | Word Size: | --- | |
Attachments: | file which segfaults when trying to extract text contents with ps2txt |
Description
Jonas Smedegaard
2020-11-13 22:05:31 UTC
The bug seems to be introduced in the commit 8c7bd787defa071c96289b7da9397f673fddb874 . My "git bisect run" did not fully converge to this commit because of build errors on Linux, but it is the only relevant commit in the smallest identified interval. I'm not sure if it is the same as the segfault, but running with -Z@?$ shows that there are MANY instances of rc_decrement where the ref_count goes from 0 to -1. Debugging 'gs_text_release' calls 'rc_free_text_enum' which then calls (via the pte->procs->release pointer) 'textw_text_release' which frees some of the 'textw_text_enum_t', but then calls (recursively) 'gs_text_release' even though the ref_count is already 0. This call results in the error message: GPL Ghostscript GIT PRERELEASE 95.40: C:\Artifex\agit\ghostpdl\base\gstext.c(840): 0x19535f48b28 has ref_count of -1! Without the -Z@?$ I confirm a segfault after page 112 with the call stack: i_free_object(gs_memory_s * mem, void * ptr, const char * cname) Line 1507 rc_free_struct_only(gs_memory_s * mem, void * data, const char * cname) Line 287 gx_device_retain(gx_device_s * dev, int retained) Line 694 gx_show_text_release(gs_text_enum_s * pte, const char * cname) Line 1332 rc_free_text_enum(gs_memory_s * mem, void * obj, const char * cname) Line 833 gs_text_release(gs_gstate_s * pgs, gs_text_enum_s * pte, const char * cname) Line 840 textw_text_process(gs_text_enum_s * pte) Line 2352 gs_text_process(gs_text_enum_s * pte) Line 711 Assigning to Ken for further analysis. > My "git bisect run" did not fully converge to this commit because of build errors on Linux, but it is the only relevant commit in the smallest identified interval. I gave it a shot, and hit 278f9a53ed507f9109380ee4210fb860b35b1811 This line of it: https://github.com/ArtifexSoftware/ghostpdl/commit/278f9a53ed507f9109380ee4210fb860b35b1811#diff-32b87e855551c33736870d37e59d086d4eade5d62a5871d721ffe7febc46edf3R2207 Program received signal SIGSEGV, Segmentation fault. 0x00005555559bc091 in gs_grestore (pgs=0x1) at ./base/gsstate.c:408 408 if (!pgs->saved) (gdb) bt #0 0x00005555559bc091 in gs_grestore (pgs=0x1) at ./base/gsstate.c:408 #1 0x0000555555a40699 in gx_default_text_restore_state (pte=<optimized out>) at ./base/gxchar.c:252 #2 0x00005555558f6028 in textw_text_process (pte=0x5555585c9598) at ./devices/vector/gdevtxtw.c:2207 #3 0x0000555555af9fd0 in op_show_continue (i_ctx_p=0x555556764058) at ./psi/zchar.c:690 #4 op_show_continue (i_ctx_p=0x555556764058) at ./psi/zchar.c:685 #5 0x0000555555adadb5 in interp (perror_object=<optimized out>, pref=<optimized out>, pi_ctx_p=<optimized out>) at ./psi/interp.c:1300 #6 gs_call_interp (pi_ctx_p=pi_ctx_p@entry=0x555556730d20, pref=pref@entry=0x7fffffffd190, user_errors=user_errors@entry=1, pexit_code=pexit_code@entry=0x7fffffffd21c, perror_object=<optimized out>) at ./psi/interp.c:520 #7 0x0000555555adc3d8 in gs_interpret (pi_ctx_p=pi_ctx_p@entry=0x555556730d20, pref=pref@entry=0x7fffffffd190, user_errors=user_errors@entry=1, pexit_code=pexit_code@entry=0x7fffffffd21c, perror_object=<optimized out>, perror_object@entry=0x7fffffffd220) at ./psi/interp.c:477 #8 0x0000555555acec7e in gs_main_interpret (perror_object=0x7fffffffd220, pexit_code=0x7fffffffd21c, user_errors=1, pref=0x7fffffffd190, minst=<optimized out>) at ./psi/imain.c:257 #9 gs_main_run_string_end (minst=minst@entry=0x555556730c80, user_errors=user_errors@entry=1, pexit_code=pexit_code@entry=0x7fffffffd21c, perror_object=perror_object@entry=0x7fffffffd220) at ./psi/imain.c:797 #10 0x0000555555aced11 in gs_main_run_string_with_length (perror_object=0x7fffffffd220, pexit_code=0x7fffffffd21c, user_errors=1, length=40, str=0x555556811680 "<2f746d702f726663313234372e7073>.runfile", minst=0x555556730c80) at ./psi/imain.c:741 #11 gs_main_run_string_with_length (minst=0x555556730c80, str=0x555556811680 "<2f746d702f726663313234372e7073>.runfile", length=40, user_errors=1, pexit_code=0x7fffffffd21c, perror_object=0x7fffffffd220) at ./psi/imain.c:727 #12 0x0000555555ad0cab in run_string (perror_object=0x7fffffffd220, pexit_code=0x7fffffffd21c, user_errors=1, options=3, str=0x555556811680 "<2f746d702f726663313234372e7073>.runfile", minst=0x555556730c80) at ./psi/imainarg.c:1121 #13 runarg (minst=minst@entry=0x555556730c80, arg=arg@entry=0x7fffffffd318 "/tmp/rfc1247.ps", post=post@entry=0x555555c3314e ".runfile", options=options@entry=3, user_errors=1, pexit_code=pexit_code@entry=0x0, perror_object=0x0, pre=<optimized out>) at ./psi/imainarg.c:1090 #14 0x0000555555ad0fc4 in argproc (arg=0x7fffffffd318 "/tmp/rfc1247.ps", minst=0x555556730c80) at ./psi/imainarg.c:1012 #15 argproc (minst=0x555556730c80, arg=0x7fffffffd318 "/tmp/rfc1247.ps") at ./psi/imainarg.c:997 #16 0x0000555555ad2800 in gs_main_init_with_args01 (minst=minst@entry=0x555556730c80, argc=argc@entry=7, argv=argv@entry=0x7fffffffde58) at ./psi/imainarg.c:242 #17 0x0000555555ad2aa9 in gs_main_init_with_args (minst=0x555556730c80, argc=argc@entry=7, argv=argv@entry=0x7fffffffde58) at ./psi/imainarg.c:289 #18 0x0000555555ad3efd in psapi_init_with_args (ctx=<optimized out>, argc=argc@entry=7, argv=argv@entry=0x7fffffffde58) at ./psi/psapi.c:272 #19 0x0000555555b38505 in gsapi_init_with_args (instance=<optimized out>, argc=argc@entry=7, argv=argv@entry=0x7fffffffde58) at ./psi/iapi.c:177 #20 0x0000555555665f0b in main (argc=7, argv=0x7fffffffde58) at ./psi/gs.c:95 (gdb) p pgs $1 = (gs_gstate *) 0x1 (gdb) up #1 0x0000555555a40699 in gx_default_text_restore_state (pte=<optimized out>) at ./base/gxchar.c:252 252 return gs_grestore(pgs); (gdb) up #2 0x00005555558f6028 in textw_text_process (pte=0x5555585c9598) at ./devices/vector/gdevtxtw.c:2207 2207 code = gx_default_text_restore_state(pte_fallback); (gdb) p pte_fallback $2 = (gs_text_enum_t *) 0x5555585c98b0 I'm afraid that Git bisecting won't help in this case because this is not, exactly, a regression. Commit d787dad3cd310788ea7201eb2fe1fff9a0a263c2 resolves the problem for me, although as noted in the commit message there are significant memory leaks which I'd like to get addressed before the next release (scheduled for March). There may be other problems revealed by Memento but the sheer number of leaks reported makes it essentially impossible to tell. I'll use a simpler file to reduce the leaks and then come back to this file when the number is under control. The memory problems were long standing and the commits identified simply caused changes in the memory layout, which caused the underlying problems to emerge. |