Bug 692158 - VMerror in pattern generation with high resolution RGB-Devices
Summary: VMerror in pattern generation with high resolution RGB-Devices
Status: NOTIFIED WORKSFORME
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Printer Driver (show other bugs)
Version: 9.01
Hardware: PC All
: P2 normal
Assignee: Ray Johnston
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-18 13:25 UTC by artifex
Modified: 2011-11-09 18:15 UTC (History)
0 users

See Also:
Customer: 870
Word Size: ---


Attachments
schleuse.pdf (1.96 MB, application/pdf)
2011-04-18 13:26 UTC, artifex
Details

Note You need to log in before you can comment on or make changes to this bug.
Description artifex 2011-04-18 13:25:29 UTC
In the PDF-file schleuse.pdf the bright blue background is realised with patterns, which needs (temporary) large bitmaps. 
To avoid the very slow rendering with clist , I used the new option 
-dMaxPatternBitmap. The value for MaxpatternBitmap must be very high, to take effect. 

When using an output resolution of 600 dpi ( for large format plotters ) and drivers, which will generate RGB-data ( e.g. pnmraw, tiff24nc ), the Ghostscript shows the messages: 
"**** Warning: File encountered 'VMerror' error while processing an image.
and in the output file, the blue background is missing.

GS-call:   
gs -r600 -sDEVICE=pnmraw -dMaxPatternBitmap=2000000000 -o out.prn schleuse.pdf
Comment 1 artifex 2011-04-18 13:26:32 UTC
Created attachment 7455 [details]
schleuse.pdf
Comment 2 Ray Johnston 2011-04-18 15:39:23 UTC
I am able to reproduce this, and this may be a case where the TOTAL memory
being used by Ghostscript exceeds what is available. While setting the 
MaxPatternBitmap=1500000000 is enough to avoid using the pattern clist
(as determined with a debug build using -Z:), I see the messages (repeatedly):

[a+]gs_malloc(large object chunk)(1460465592) = 0x0: exceeded limit, used=1589049085, max=1589049085

   **** Warning: File encountered 'VMerror' error while processing an image.

One of use will do a little more investigation, but I suspect there is no
solution other than to use the pattern clist.

Using -dPDFDEBUG shows the warnings (since we skip the images, this should
probably be an errer) following the debug:

%Resolving: [26 0]
%Pattern: << /File (...) /PaintType 1 /XStep 128 /PaintProc 
{<< /XObject << /R26 {26 0 resolveR} >> /ProcSet [/PDF /ImageC] >>
 .pdfpaintproc} /Length 76 /.pattern_uses_transparency false /YStep 32
/PatternType 1 /Matrix [9.51582 5.40463 -38.5004 67.7868 648.402 1234.88]
/Resources << /XObject << /R26 {26 0 resolveR} >> /ProcSet [/PDF /ImageC] >>
/Filter /FlateDecode /TilingType 1 /BBox [0 0 128 32] /Type /Pattern >>
...
0 23.3843 23.3843 0 246.099 5403.24 cm
BI
/IM true
/W 16 /H 16 /BPC 1 /D [
1 0 ]
/F /CCF /DP <<
/K -1 /Columns 16 >>
ID
[a+]gs_malloc(large object chunk)(1460465592) = 0x0: exceeded limit, used=1589048998, max=1589048998

   **** Warning: File encountered 'VMerror' error while processing an image.
Comment 3 Ray Johnston 2011-10-14 03:38:47 UTC
The offending pattern is 20417x23842 at 600 dpi so the bytes required for the
tile are 1,460,346,342. At the time this allocation fails (on a 32-bit build),
the 'used' memory by gs_heap_alloc_bytes is 64,692,494 so as far as gsmalloc.c
knows, we do _NOT_ exceed the limit. Looking with the task manager, the 
memory usage before the VMerror is what memento _and_ gsmalloc have, but
even though the total memoey would be less than 2Gb, Windows fails. I've
determined (with the debugger) that by 'fudging' the allocation size down,
the most I can allocate is about 1,260,465,664 (1,360,465,664 fails).
I guess Windows doesn't let a user process have the full 2Gb. \

Also collecting some information on this to see if there is anything that can
be done to improve the pattern-clist performance for this case. It will always
be slower than when MaxPatternBitmap is large enough.

By running this file at 72 dpi with -dMaxBitmap=10000 -dMaxPatternBitmap=10000
I determined that this file performs 60,696 image mask fills, and calls
tile_pattern_clist (from tile_by_steps) 122087 times during WRITING of the
clist for the page.

On peeves (64-bit linux, ~3GHz Core i7):

At 72dpi with =dMaxPatternBitmap=100000000 the job completes in 18 seconds

With -dMaxPatternBitmap=10000 72 dpi, a releaase build takes 820 seconds.

At 150dpi, the time goes to 2040 seconds.

From these two timings this looks like less than N^2, so, 600 dpi _should_
take < 53000 seconds.

At 300dpi, the time is 6220 seconds.

At 600 dpi, it strangely, this produces 58,234 messages:

**** Warning: File has insufficient data for an image.

then the job dies during clist reading after:
   **** Error reading a content stream. The page may be incomplete.
% Outputpage start time = 3052.73, memory allocated = 10210432, used = 7307029
Comment 4 Ray Johnston 2011-10-14 14:26:30 UTC
The strange 600 dpi behavior on peeves was due to /tmp filling up. The clist
file size is HUGE. It fills up the (limited space SSD) drive on peeves and
then we get strange results :-(

Good thing we have 64-bit clist file support. The final clist size is 273 Gb!

It hasn't finished yet, but the Outputpage start time was 11481 seconds. For
300dpi this time was 4364 seconds and 150dpi was 2040 seconds.

Interestingly, some of the performance hit may not just be the pattern clist
playback during the clist writing, but the rendering time. The 72 dpi page
rendered in 1 seconds, the 150dpi page took 4 seconds, but the 300dpi took 1860
seconds. As mentioned above, the 600 dpi case isn't finished yet. Even if the
600dpi rendering time won't be accurate since there was at least one cluster
regression while it was still running.
Comment 5 Ray Johnston 2011-10-14 19:05:36 UTC
I did a little more testing of the Windows allocation. The largest allocation I
can do is 1,340,465,664 (1,350,465,664 fails). The task manager shows the memory usage is 1,524,068K after that allocation.

Guess that there is no way to get there with a 32-bit Windows build.
Comment 6 Ray Johnston 2011-10-15 17:10:47 UTC
Thinking about the severe rendering performance degradation, it occurred to me
that even though the really LARGE pattern that required the probelmatice 1.5Gb
MaxPatternBitmap value, all but this case of patterns could get by with less
than 100M (determined with debugger) and that setting a larger BufferSpace may
also improve performance.

With the command line options:

 -r600 -dMaxBitmap=10000 -dMaxPatternBitmap=100000000 -dBufferSpace=100000000 -sDEVICE=pnmraw -o /dev/null -Z: Bug692158.pdf

This file COMPLETED in  10098 seconds (rendering time is 1412 seconds).

At 300 dpi, the total time is: (and the Outputpage start time is 4060 and the
rendering time is only 64 seconds.

It seems that keeping all but the largest pattern as bitmap rather than clist
really improves the rendering time to where it is reasonable.

Given the relatively unusual (and stressful) nature of this file that does
60K imagemask fills with a VERY large are pattern, I think that further work
on this is not warranted.

Note that with a 64-bit build, using -dMaxPatternBitmap=1500000000 the 600dpi
file completes in 49 seconds (Outputpage start time at 21 seconds), so using
a 64-bit build is also a way to achieve even better performance than the 
parameters above with a 32-bit build.