The customer reports that Ghostscript takes a long time to process a PDF file even at low (30 DPI) resolution. I've verified that it does take a while (see timing figures in next comment). The file is large, so is available at: casper:/home/support/689476/ADA_250dpi.eps The command line the customer is using: gs -sDEVICE=tiff32nc -dEPSCrop -sOutputFile=a.tif -r30 -dNOPAUSE -f ./ADA_250dpi.eps -c quit
Created attachment 3413 [details] memory_usage.png The customer reports the following timing with gs8.60 (cpu type, memory configuration, and operating system not reported): 30 dpi - ~1:00 72 dpi - ~1:30 (with non-banding device) 400 dpi - 1:37 (with banding device) Timing on my Linux AMD 64 5600+ with gshead (r8252) (but gs8.53 and gs8.60 are very similar): 30 dpi - 0:33 300 dpi - 0:37 Timing on my iMac Core 2 2.8 GHz with gshead (r8252): 30 dpi - 0:07 300 dpi - 0:09 Both of my machines are equipped with 2 Megs of memory and Ghostscript does use ~1 gig of memory (see attached screenshot of memory allocation). I'm a bit surprised by the sudden increase in memory usage; Ghostscript steadily uses memory up until ~450 megs and then allocates ~700 megs in one shot. Generally my Linux box and my iMac are about the same speed, so I'm not sure why the large difference in this case.
The slow speed could be due to VM thrashing on a machine that doesn't have enough physical RAM. That would also explain day to day variation depending on how much physical RAM is locked by other processes when gs is trying to run. I'll take a look at the file to see what's causing the large allocation. I do know that the jasper JPEG2000 decoder can be a hog.
When I ran the timing figures I monitored each machine to make sure it wasn't swapping (and I misspoke earlier, my Linux box has 4 gigs of RAM). Running with the Luratech decoder instead of Jasper didn't change the timing nor the memory usage.
How JPEG2000 issues can affect an EPS file ? The file creates an reusable stream 800M long and uses it once as a data source for an image - that's all. Most of the time is spent in the memory management code. On peeves the file takes about 60 min. I cannot imagine 1GB file running in 7 sec. - please recheck. % cumulative self self total time seconds seconds calls s/call s/call name 87.98 1438.23 1438.23 14221096 0.00 0.00 chunk_locate_ptr 1.84 1468.23 30.00 5911904 0.00 0.00 gs_memset 1.82 1497.96 29.73 10953039 0.00 0.00 gc_mark_string 1.81 1527.50 29.54 5904410 0.00 0.00 gc_strings_set_reloc 1.46 1551.37 23.87 20010 0.00 0.00 alloc_link_chunk 1.38 1573.87 22.50 5904410 0.00 0.00 gc_strings_compact 1.09 1591.66 17.79 87146253 0.00 0.00 alloc_open_chunk 0.44 1598.90 7.24 5463 0.00 0.00 image_render_color 0.34 1604.41 5.51 942057 0.00 0.00 s_A85D_process 0.23 1608.21 3.80 87150761 0.00 0.00 alloc_close_chunk 0.20 1611.46 3.25 1430700 0.00 0.00 gs_memcpy 0.20 1614.65 3.19 13199 0.00 0.03 gc_trace 0.16 1617.31 2.67 11026078 0.00 0.00 igc_reloc_string 0.09 1618.71 1.40 880 0.00 1.78 gs_gc_reclaim 0.08 1620.01 1.30 5904410 0.00 0.00 gc_objects_clear_marks 0.08 1621.29 1.28 5904410 0.00 0.00 gc_objects_compact 0.06 1622.31 1.02 28634739 0.00 0.00 cmap_cmyk_direct 0.06 1623.32 1.01 8857440 0.00 0.00 igc_reloc_ref_ptr_nocheck 0.06 1624.25 0.93 5904410 0.00 0.00 gc_do_reloc
Sorry, I wasn't clear; the times I reported in Comment #0 where in hours:minutes, so it takes 7 minutes on my iMac and the customer reported 60 minutes. marcos@imac:[3]% time gs860 -sDEVICE=tiff32nc -dEPSCrop -sOutputFile=a.tif -r30 -dNOPAUSE -f ./ADA_250dpi.eps -c quit GPL Ghostscript 8.60 (2007-08-01) Copyright (C) 2007 Artifex Software, Inc. All rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for details. 271.871u 20.342s 7:19.64 66.4% 0+0k 14+102io 0pf+0w marcos@imac:[4]%
Our reusable stream implementation is highly inefficient. First, the source file is read into an array of strings. Second, a single buffer is allocated and the strings are copied there. So the reusable steam takes 2x of core for a given stream length. We need either read the stream directly to the buffer and realloc it as needed or implement a stream that takes an array of strings for the storage.
Marcos is going to try 80000000 setvmthreshold to see if that helps on his slow machine. Assigning to Alex since he owns the gs_frsd.ps and can do the C implementation to see if this can help.
Adding "-c 80000000 setvmthreshold" to the command line decreases the processing time to just under 2 minutes on my Linux box and about 3 minutes on my iMac.
The complete command line referred to in comment #8: time gs860 -sDEVICE=tiff32nc -dEPSCrop -sOutputFile=a.tif -r30 -dNOPAUSE -c 80000000 setvmthreshold -f ./ADA_250dpi.eps -c quit
Can we change the default vmthreshold to a larger value in Ghostscript?
Replacement gs_frsd.ps attached that uses compression to build dict that is used to determine the size of the bytestring (and save the contents for filling the bytestring). Further refinement is needed since this still creates the 840Mb bytestring as uncompressed, but at least the peak memory usage is about 980Mb (the compressed form of the data is only 92Mb). I am able to run this file on my 2GHz Core 2 Duo w/ 2Gb RAM, generating a 400 dpi png16m image in 416 sec, and a 30 dpi png16m can be created in 138 sec. The entire command line I used was: gswin32c -dBufferSpace=40000000 -sDEVICE=png16m -o x.png -dEPSCrop -r30 -c "32000000 setvmthreshold" -f ADA_250dpi.eps I recommend that the ReusableStream be implemented as blocks of compressed data so that fileposition can be readily implemented and so we never need to store uncompressed data. We may want to make a ReusableStreamEncode filter that can create this for use by the ReusableStreamDecode filter. Decision on implementation left to Alex.
Created attachment 3463 [details] gs_frsd.ps Create temporary data in a dict of Flate compressed strings, then create a bytestring of the uncompressed data as before.
The customer reports that the modified gs_frsd.ps works for them to avoid the VMerror. Leaving the bug open until some method is committed.
Created attachment 3469 [details] patch This is another approach - a reusable stream that uses an array of strings for data storage. The patch passes both CET and Comparefiles testing.
Comment on attachment 3463 [details] gs_frsd.ps This patch messed up the global state.
Created attachment 3988 [details] gs_frsd.ps Corrected change to gs_frsd.ps workaround that doesn't mess up the global allocation state. Still waiting for Alex to complete the "real" fix.
The patch from the comment #14 is committed as a rev. 8810. http://ghostscript.com/pipermail/gs-cvs/2008-June/008392.html Regression testing shows no differences. Compression can be added later if needed.