Created attachment 18834 [details] customer file Using the attached customer file and the following command line: -dNumRenderingThreads=4 -sDEVICE=fpng -r720 -sOutputFile=out.png -dBufferSpace=67108864 -dNOPAUSE -dNOPROMPT -dBATCH -dPrinted -dNOPSICC -dUseCropBox 20140408_tfnsw_learner_journey_a2.pdf Ghostscript on Windows, either 32 or 64-bit, debug or release, throws an error and exits. The debug build gives more information so I'm quoting that: ------------------------------------------------------------------------------ D:\temp>\ghostpdl\debugbin\gswin64c -sBandListStorage=memory -dNumRenderingThreads=4 -sDEVICE=tiffsep -r720 -sOutputFile=out.tif -dBufferSpace=67108864 -dNOPAUSE -dNOPROMPT -dBATCH -dPrinted -dNOPSICC -dUseCropBox D:\ken\Downloads\20140408_tfnsw_learner_journey_a2.pdf GPL Ghostscript GIT PRERELEASE 9.51 (2019-10-15) Copyright (C) 2019 Artifex Software, Inc. All rights reserved. This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY: see the file COPYING for details. Processing pages 1 through 1. Page 1 Querying operating system for font files... Substituting font Helvetica-Bold for Arial-BoldMT. Loading NimbusSans-Bold font from %rom%Resource/Font/NimbusSans-Bold... 4520268 3037334 3860664 2347275 4 done. Substituting font Helvetica for ArialMT. Loading NimbusSans-Regular font from %rom%Resource/Font/NimbusSans-Regular... 4586380 3213439 3934408 2422462 4 done. Substituting font Helvetica-BoldOblique for Arial-BoldItalicMT. Loading NimbusSans-BoldItalic font from %rom%Resource/Font/NimbusSans-BoldItalic... 5200212 3854798 4432352 2901641 4 done. [a+]gs_malloc(memfile newphys)(65536) = 0x0: exceeded limit, used=2147452530, max=2147452530 [a+]gs_malloc(memfile newphys)(65536) = 0x0: exceeded limit, used=2147452530, max=2147452530 [a+]gs_malloc(memfile newphys)(65536) = 0x0: exceeded limit, used=2147452530, max=2147452530 [a+]gs_malloc(memfile newphys)(65536) = 0x0: exceeded limit, used=2147452530, max=2147452530 [a+]gs_malloc(memfile newphys)(65536) = 0x0: exceeded limit, used=2147452530, max=2147452530 [a+]gs_malloc(memfile newphys)(65536) = 0x0: exceeded limit, used=2147452530, max=2147452530 [a+]gs_malloc(memfile newphys)(65536) = 0x0: exceeded limit, used=2147452530, max=2147452530 [a+]gs_malloc(memfile newphys)(65536) = 0x0: exceeded limit, used=2147452530, max=2147452530 Clist I/O error: cbp past end of buffer Clist I/O error: cbp past end of buffer Clist I/O error: cbp past end of buffer Clist I/O error: cbp past end of buffer Thread 3 rendering thread ending memory state... memory dump done. Thread 2 rendering thread ending memory state... memory dump done. Thread 1 rendering thread ending memory state... memory dump done. Thread 0 rendering thread ending memory state... memory dump done. Error: /unknownerror in --showpage-- Operand stack: 1 true Execution stack: %interp_exit .runexec2 --nostringval-- showpage --nostringval-- 2 %stopped_push --nostringval-- showpage showpage false 1 %stopped_push 1990 1 3 %oparray_pop 1989 1 3 %oparray_pop 1977 1 3 %oparray_pop 1978 1 3 %oparray_pop showpage showpage 2 1 1 showpage %for_pos_int_continue 1981 1 7 %oparray_pop showpage showpage 1840 0 9 %oparray_pop showpage showpage Dictionary stack: --dict:746/1123(ro)(G)-- --dict:1/20(G)-- --dict:80/200(L)-- --dict:80/200(L)-- --dict:135/256(ro)(G)-- --dict:318/325(ro)(G)-- --dict:30/32(L)-- --dict:6/9(L)-- --dict:7/20(L)-- Current allocation mode is local GPL Ghostscript GIT PRERELEASE 9.51: Unrecoverable error, exit code 1 ------------------------------------------------------------------------------ Process Monitor shows that the GS executable is using 1.5GB of memory immediately before the failed allocation. The customer says that running on Linux or Mac OS succeeds, but I don't have access to a machine running either OS with that much memory available. Reducing the resolution, or the number of threads, causes the file to run to completion using less memory (2 threads completes doe me peaking at just under 1.5 GB). The file is comparatively simple, I see no transparency nor Pattern colour spaces. However two images do have Interpolation set to true. If I run the failing command line with the addition of -dNOINTERPOLATE then the file runs to completion peaking at ~500MB of memory usage (its also *much* faster).
I've tentatively assigned this to Ray, because its multi-threading and clist. However the -dNOINTERPOLATE may mean its really Robin's. I guess the questions are: Why does this use so much memory ? Why doesn't it exit on the first failed allocation ? Why does 64-bit Ghostscript on Windows apparently have a limit of 2GB memory (my machine has 64GB physical memory available) ? This isn't a crash, its a graceful exit, so I suspect simply explaining the situation might be sufficient.
So following up it seems the problem is due to the fact that the customer's command line has -dBufferSpace=67108864 and has 4 rendering threads, and the fact that 64-bit Windows has a limit of 2GB on total memory allocation. Chrisl points out that gdevprn.c line 254 has a loop where we try to allocate BufferSpace and run round decreasing it until we get a vlue which we can actually allocate. However, this takes no account of multiple rendering threads, so we can end up in the case where we can allocate BufferSpace for one thread, but not for as many threads as are spawned. I'm guessing that if we were to improve this loop to acount for multiple threads, we would be able to reduce the BufferSpace to fit into the available memory, and this problem would not arise. On a related note we shoudl not be limiting Windows 64-bit builds to 2GB memory, and I'll open a separate bug for that.
OK, the use of BandListStorage=memory is the first issue. The size of the clist is really not limited, depending on the complexity of the page, and really has nothing to do with the BufferSpace size. On most modern computers, using BandListStorage=file is no slower, and if the memfile is working correctly, it is supposed to switch to compression if an allocation fails -- I'll look into that. Obviously, compressing, then expanding the clist during playback will be painfully slow, particularly with multiple threads. The BandListStorage=memory is really intended for diskless systems (embedded printers). Next, if we do fail when writing the clist (for any reason) we should *NOT* continue and try to render, much less with multiple threads. I will fix that! A BufferSpace setting of 64m (67,108,864) is totally reasonable, and is probably not at issue here. Generally larger buffer sizes results in more lines per band which makes the clist (slightly) more efficient, particularly if there are skewed images. Lastly, the clist logic *tries* to defer interpolation to rendering so that multiple threads can contribute, but there are many (too many to mention) conditions that can cause the clist writing to fall back to processing the image in 'default' mode, which when interpolating will result in 'copy_color' primitives for all the pixels. This will slow things down (interpolating and writing the image) and expand the clist size. While -dNOINTERPOLATE is a 'sledgehammer' to avoid interpolation, we have a control over how much interpolation is performed. Interpolation to 720dpi is probably not visible therefore wasted time. The better control is to use -dInterpolateControl=__ as described in https://www.ghostscript.com/doc/current/Use.htm#Rendering_parameters For example, -dInterpolateControl=4 will interpolate images that have the '/Interpolate true' in the image dictionary, but ONLY to 4x4 pixels blocks, i.e. 180dpi when using -r720. This speeds up processing, even if the interpolation is being performed in multiple rendering threads. I will look into the problems (trying to render after memfile creation error, and why switching to compression was not invoked), but limiting interpolation and using disk based clist are STRONGLY recommended in general.
I get a somewhat different result than Ken did. Running 32-bit Windows, I see the problem is that the large memfile is actually a pattern-clist. This is 585,319,595 in total size (the original Image, object 46 in the PDF is 663 lines, 663 pixels per line, RGB). The image at 720dpi, at the scaling of the PDF this turns into a 13945x13945 image (after interpolation). This calculates to 583,389,075 bytes of RGB data. This data is written to the 'main' clist. It is in the threads when we are reading from the 'memfile' data for the pattern from the main thread for the pattern cache entry. Thus 4 threads need 4*585,319,595 = 2,341,278,380 bytes, so we get allocation failures. I'm looking at why the memfile doesn't trigger compression when the alloc fails. If the memfile compression is used, it reduces the required storage to < 32Mb. Note that with -dInterpolateControl=4 it seems that the method, while faster, does not reduce the clist filesize. I will discuss this internally to decide if we want to investigate it another method.
Apparently improvements in the image interpolation has improved things so that the patch to use default image processing when an interpolated image is being processed into a pattern-clist (see bug 694514). With current HEAD (SHA e193b5d) this takes 23.3 seconds. With the patch removed that restricts HL images to the pattern-clist, this file completes in 24.8 seconds. With this file, the pattern-clist (without HL images) uses > 500Mb, and the file (a single page) takes 192 sec. Enabling HL images reduces the pattern- clist to 1.2Mb and the page completes in 36 sec. With -dNumRenderingThreads=4 (and with HL images enabled for pattern-clist) this file completes in < 15 sec.
I'm not totally convinced from a git bisect run (since I have to remove the disable of pattern-clist between bisect builds), but initially it showed up as commit 7e342b59, but this might have been masking an earlier commit that allowed the pattern-clist mode to efficiently support interpolation.
I did more testing with the file from bug 694514 and the commit 8117f3e31 lets that file complete in under 3 seconds when HL interpolated images are enabled for pattern_clist. The previous commit 0f3691890 takes almost 32 seconds. Without the pattern_clist HL enabling of interpolated images, the 8117f3e31 takes over 4 seconds. Looks like the patch for bug694514 is no longer needed.
Fixed in commit 1f6b351a5048b5a14d52431c833da6ed3386a7bf