In the PDF-file schleuse.pdf the bright blue background is realised with patterns, which needs (temporary) large bitmaps. To avoid the very slow rendering with clist , I used the new option -dMaxPatternBitmap. The value for MaxpatternBitmap must be very high, to take effect. When using an output resolution of 600 dpi ( for large format plotters ) and drivers, which will generate RGB-data ( e.g. pnmraw, tiff24nc ), the Ghostscript shows the messages: "**** Warning: File encountered 'VMerror' error while processing an image. and in the output file, the blue background is missing. GS-call: gs -r600 -sDEVICE=pnmraw -dMaxPatternBitmap=2000000000 -o out.prn schleuse.pdf
Created attachment 7455 [details] schleuse.pdf
I am able to reproduce this, and this may be a case where the TOTAL memory being used by Ghostscript exceeds what is available. While setting the MaxPatternBitmap=1500000000 is enough to avoid using the pattern clist (as determined with a debug build using -Z:), I see the messages (repeatedly): [a+]gs_malloc(large object chunk)(1460465592) = 0x0: exceeded limit, used=1589049085, max=1589049085 **** Warning: File encountered 'VMerror' error while processing an image. One of use will do a little more investigation, but I suspect there is no solution other than to use the pattern clist. Using -dPDFDEBUG shows the warnings (since we skip the images, this should probably be an errer) following the debug: %Resolving: [26 0] %Pattern: << /File (...) /PaintType 1 /XStep 128 /PaintProc {<< /XObject << /R26 {26 0 resolveR} >> /ProcSet [/PDF /ImageC] >> .pdfpaintproc} /Length 76 /.pattern_uses_transparency false /YStep 32 /PatternType 1 /Matrix [9.51582 5.40463 -38.5004 67.7868 648.402 1234.88] /Resources << /XObject << /R26 {26 0 resolveR} >> /ProcSet [/PDF /ImageC] >> /Filter /FlateDecode /TilingType 1 /BBox [0 0 128 32] /Type /Pattern >> ... 0 23.3843 23.3843 0 246.099 5403.24 cm BI /IM true /W 16 /H 16 /BPC 1 /D [ 1 0 ] /F /CCF /DP << /K -1 /Columns 16 >> ID [a+]gs_malloc(large object chunk)(1460465592) = 0x0: exceeded limit, used=1589048998, max=1589048998 **** Warning: File encountered 'VMerror' error while processing an image.
The offending pattern is 20417x23842 at 600 dpi so the bytes required for the tile are 1,460,346,342. At the time this allocation fails (on a 32-bit build), the 'used' memory by gs_heap_alloc_bytes is 64,692,494 so as far as gsmalloc.c knows, we do _NOT_ exceed the limit. Looking with the task manager, the memory usage before the VMerror is what memento _and_ gsmalloc have, but even though the total memoey would be less than 2Gb, Windows fails. I've determined (with the debugger) that by 'fudging' the allocation size down, the most I can allocate is about 1,260,465,664 (1,360,465,664 fails). I guess Windows doesn't let a user process have the full 2Gb. \ Also collecting some information on this to see if there is anything that can be done to improve the pattern-clist performance for this case. It will always be slower than when MaxPatternBitmap is large enough. By running this file at 72 dpi with -dMaxBitmap=10000 -dMaxPatternBitmap=10000 I determined that this file performs 60,696 image mask fills, and calls tile_pattern_clist (from tile_by_steps) 122087 times during WRITING of the clist for the page. On peeves (64-bit linux, ~3GHz Core i7): At 72dpi with =dMaxPatternBitmap=100000000 the job completes in 18 seconds With -dMaxPatternBitmap=10000 72 dpi, a releaase build takes 820 seconds. At 150dpi, the time goes to 2040 seconds. From these two timings this looks like less than N^2, so, 600 dpi _should_ take < 53000 seconds. At 300dpi, the time is 6220 seconds. At 600 dpi, it strangely, this produces 58,234 messages: **** Warning: File has insufficient data for an image. then the job dies during clist reading after: **** Error reading a content stream. The page may be incomplete. % Outputpage start time = 3052.73, memory allocated = 10210432, used = 7307029
The strange 600 dpi behavior on peeves was due to /tmp filling up. The clist file size is HUGE. It fills up the (limited space SSD) drive on peeves and then we get strange results :-( Good thing we have 64-bit clist file support. The final clist size is 273 Gb! It hasn't finished yet, but the Outputpage start time was 11481 seconds. For 300dpi this time was 4364 seconds and 150dpi was 2040 seconds. Interestingly, some of the performance hit may not just be the pattern clist playback during the clist writing, but the rendering time. The 72 dpi page rendered in 1 seconds, the 150dpi page took 4 seconds, but the 300dpi took 1860 seconds. As mentioned above, the 600 dpi case isn't finished yet. Even if the 600dpi rendering time won't be accurate since there was at least one cluster regression while it was still running.
I did a little more testing of the Windows allocation. The largest allocation I can do is 1,340,465,664 (1,350,465,664 fails). The task manager shows the memory usage is 1,524,068K after that allocation. Guess that there is no way to get there with a 32-bit Windows build.
Thinking about the severe rendering performance degradation, it occurred to me that even though the really LARGE pattern that required the probelmatice 1.5Gb MaxPatternBitmap value, all but this case of patterns could get by with less than 100M (determined with debugger) and that setting a larger BufferSpace may also improve performance. With the command line options: -r600 -dMaxBitmap=10000 -dMaxPatternBitmap=100000000 -dBufferSpace=100000000 -sDEVICE=pnmraw -o /dev/null -Z: Bug692158.pdf This file COMPLETED in 10098 seconds (rendering time is 1412 seconds). At 300 dpi, the total time is: (and the Outputpage start time is 4060 and the rendering time is only 64 seconds. It seems that keeping all but the largest pattern as bitmap rather than clist really improves the rendering time to where it is reasonable. Given the relatively unusual (and stressful) nature of this file that does 60K imagemask fills with a VERY large are pattern, I think that further work on this is not warranted. Note that with a 64-bit build, using -dMaxPatternBitmap=1500000000 the 600dpi file completes in 49 seconds (Outputpage start time at 21 seconds), so using a 64-bit build is also a way to achieve even better performance than the parameters above with a 32-bit build.