692352 – GS PDF conversion eats up and corrupts memory

Bug 692352 - GS PDF conversion eats up and corrupts memory

Summary: GS PDF conversion eats up and corrupts memory

Status:	RESOLVED FIXED

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	Color (show other bugs)
Version:	master
Hardware:	PC Windows XP

Importance:	P1 critical
Assignee:	Alex Cherepanov

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-07-18 09:14 UTC by Andrei Sosnin
Modified:	2014-02-17 04:40 UTC (History)
CC List:	5 users (show)

See Also:
Customer:
Word Size:	---

Attachments
An example file causing crashes (2.88 MB, application/pdf) 2011-07-18 09:14 UTC, Andrei Sosnin	Details
Simpler file (still very complex) (1.68 MB, application/pdf) 2011-07-18 16:33 UTC, Ken Sharp	Details
experimental patch (898 bytes, patch) 2011-07-27 17:39 UTC, Alex Cherepanov	Details \| Diff
working patch (944 bytes, patch) 2011-08-02 05:58 UTC, Alex Cherepanov	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrei Sosnin 2011-07-18 09:14:08 UTC

Created attachment 7683 [details]
An example file causing crashes

When trying to convert the file to JPEG, Ghostscript uses up about 1.8 GB of memory, then "Access violation" error is raised, somewhere in color management code (zusparam.c:994 and gsicc_lcms.c:304). 

Tested it with revision 99cbbdaad585b78dfd32db6dc59c08f9743d8d6a from 2011-07-11 11:22:20. 

Just before crashing it always fails allocating some memory (FIY, test machine is an Intel 2 Duo 7400 with 3 GB of RAM):

[a+]gs_malloc(lcms)(557188) = 0x0: failed, used=1778217020, max=1778217020
  .\base\gsicc_lcms.c:33: gscms_error(): cmm error : Couldn't allocate 557188 bytes for profile
[a+]gs_malloc(lcms)(1679046) = 0x0: failed, used=1777720558, max=1778217020

Depending on the execution method (using gswin32.exe or my own application and gslib32.dll), it fails allocating different bits of memory and crashes in different places, but consistently for each execution method (the above message was from the DLL test). 

The offending file is attached. It's a fairly big PDF full of photos, apparently made with Adobe InDesign CS4.

I tried opening it in Evince, at it did open it fine with only max 32 MB of peak memory usage, although it did take it more time to process it, than it usually takes. 

In case it doesn't get attached, here's a Dropbox link to it:
http://dl.dropbox.com/u/9255089/Nordjyske-Thisted-018-20110714.pdf

Comment 1 Andrei Sosnin 2011-07-18 12:40:28 UTC

Reproduced the bug in the today's latest revision 80f11f7c17fd2d58eded8e8721a6932a0384f5f5 just in case. Same result.

Comment 2 Ray Johnston 2011-07-18 14:59:05 UTC

Since this sounds serious, raising the priority so it gets attention before
the release (at least until it is analyzed).

Comment 3 Ken Sharp 2011-07-18 16:33:03 UTC

Created attachment 7686 [details]
Simpler file (still very complex)

Per Henry's suggestion, assigning to Alex for initial investigation. I have attached a reduced file which still exhibits both the excessive memory use, and the crash.

The crash is caused when we exhaust memory and using -K500000 this can be provoked more quickly. There are two problems :

1) Why are we using so much memory
2) Why do we crash instead of gracefully handling the error condition. This seems to be something to do with LCMS.

Comment 4 Alex Cherepanov 2011-07-20 05:12:42 UTC

The file has 4731 smooth shadings. Every one takes about 1M of memory.
Valgrind finds no memory leaks when the number of shadings (and memory
consumption) is reduced to 1000.

Comment 5 Alex Cherepanov 2011-07-27 17:39:49 UTC

Created attachment 7722 [details]
experimental patch

The problem is caused by excessive memory allocation in .buildshading operator
and a large number of shadings in the sample file.

Running the shading in save-restore context solves the problem. The file runs
quickly and takes little memory. Unfortunately, this patch causes SEGV in
a few files from the regression suite, for instance:
gs -P -r72 -sDEVICE=pkmraw -o new.pkm  Bug689189.pdf

The crash appears to be related to premature freeing of the color space 
structure. Running with -dNOGC cures the crash.

There are other strange things related to smooth shading.
Normally, PDF interpreter caches the data generated by .buildshading .
Switching this function off doesn't help with memory allocation.
Apparently, the memory allocated by .buildshading cannot be reclaimed by GC.

Smooth shading has to copy shading data to a reusable stream.
This part of the code doesn't cause excessive memory usage or other problems.

Comment 6 Robin Watts 2011-07-29 12:04:54 UTC

I spent some time looking into this yesterday, and my conclusions match those of Alex. It's not the shading algorithm itself that leaks, but something in the reference handling of the postscript operators involved.

With the following command:

gs/debugbin/gswin32c.exe -dMaxBitmap=100000000 -P -r72 -sDEVICE=pkmraw -o ne
w.pkm ../ghostpcl/tests_private/comparefiles/Bug689189.pdf

with alexes patch inserted and memento enabled, I find that:

 * A 'save' operator causes a new chunk to be created (allocation event 153212).

 * Shortly afterwards we hit idict.c:dict_create_contents:196 and allocate a ref array, that returns with address 0xb569a8 (for me).

 * At allocation event 153222, we pull the ripcord and the whole chunk is freed (on a restore).

 * Shortly thereafter we do a VMreclaim, and in the middle of the ref marking phase, between allocation events 155751 and 155752 we write to 0xb569a9.

My reading of this is that somehow a pointer is being kept in gc memory to something that should have been cleaned up when the restore happened. But I am having no luck tracking this down, so returning it to Alex for him to consider.

Comment 7 Henry Stiles 2011-07-29 18:18:35 UTC

FWIW, with Alex's patch and the regression files:

[a12:-o ]zrestore savetype(8) 0x101ccd4c0
GPL Ghostscript GIT PRERELEASE 9.04: ./psi/ilocate.c(452): At 0x101cbdfc8, array 0x101db2a18 not in any chunk
GPL Ghostscript GIT PRERELEASE 9.04: ./psi/ilocate.c(452): At 0x101cbdfd8, array 0x101db2b28 not in any chunk
[a12]closing chunk 0x10151c6d0 (0x101cc9448..0x101ccd4b0, 0x101ccd4e8..0x101ccd5c8..0x101cce250)
[a12]opening chunk 0x1015223d0 (0x101cd3448..0x101cd7388, 0x101cd74e8..0x101cd75c8..0x101cd8250)
[a12:+b.]ref_param_read_init -bytes-*(256=64*4) = 0x101cd7398
[a12:+b ]set_default_gray_icc (17) = 0x101cd74a8


Looking back for the culprit addresses:

[a8:-of]s_std_close zlibEncode/Decode state(144) 0x101eb9268
[a12:+$ ]dict_alloc(5) = 0x101cbdfc8
[a12:+$ ]dict_create_contents(values)(9) = 0x101cbe018
[a12:+$ ]dict_create_contents(packed keys)(2) = 0x101cbe0a8
[a12:+<f]alloc_save_change alloc_change(40) = 0x101da83d8


and:

[a12:+< ]gs_function_Sd_init gs_function_Sd_t(208) = 0x101db2868
[a12:+< ]gs_shading_R_init gs_shading_R_t(136) = 0x101db2948
[a12:+< ]alloc_save_change alloc_change(40) = 0x101db29e0
[a12:+<.]dict_create_contents(values) refs*(288=18*16) = 0x101db2a18
[a12:+$ ]dict_create_unpacked_keys(17) = 0x101db2b28
[a12:-$#]dict_resize(old values)(9) 0x101cbe018
[a12:-$#]dict_resize(old keys)(9) 0x101ccd1f0
[a12:+< ]gs_make_pattern_common gs_pattern2_instance_t(104) = 0x101db2c58

If the objects gs_shading_R_t or another are holding a reference to the dictionary created after the save the behavior is expected given the placement of the save/restore.

Comment 8 Ray Johnston 2011-07-29 18:52:34 UTC

The function based shadings all have an enumerated pointer to Function.

If this is allocated after the 'save' in other than stable_memory, then it
will be a problem if the gs_shading_R_t (or other) persists after the 'restore'. 

See gsshade.c gs_shading_Fb_init (does ALLOC_SHADING) and private_st_shading_Fb()
in gsshade.h

Comment 9 Robin Watts 2011-08-01 11:23:07 UTC

Due to finger trouble with git, I seem to have pushed Alex's patch for this as commit bbb8f98 - that wasn't supposed to get off my local machine, sorry!

Should I do another commit backing it out?

Comment 10 Alex Cherepanov 2011-08-02 05:58:54 UTC

Created attachment 7735 [details]
working patch

This patch works. it reduces memory usage but causes no regressions.

The only significant difference from the earlier one is that the result
from .buildshading is no longer saved in a directory. Commented out 'gsave'
and 'grestore' are not significant. The path is fine either way.

I don't understand why shoring the shading in a dictionary was not undone
by 'restore'. Perhaps, Ray has something to say about it.

Comment 11 Ken Sharp 2011-08-25 09:29:20 UTC

(In reply to comment #10)
> Created an attachment (id=7735) [details]
> working patch
> 
> This patch works. it reduces memory usage but causes no regressions.
> 
> The only significant difference from the earlier one is that the result
> from .buildshading is no longer saved in a directory. Commented out 'gsave'
> and 'grestore' are not significant. The path is fine either way.
> 
> I don't understand why shoring the shading in a dictionary was not undone
> by 'restore'. Perhaps, Ray has something to say about it.

Shouldn't we commit the patch and close the bug ? If Ray needs to look at why the shading isn't freed from the dictionary by restore, then the bug should be re-assigned to Ray (or possibly a new one opened)

Comment 12 Alex Cherepanov 2011-09-07 04:42:14 UTC

I patch similar to the patch from the comment #10 has been committed as a rev.
b3ee2cd07fc7bdfbc3014316f0d4f7da54c900ee

Regression testing shows no significant differences.