Bug 689476 - Ghostscript slow processing PDF file
Summary: Ghostscript slow processing PDF file
Status: NOTIFIED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Interpreter (show other bugs)
Version: master
Hardware: PC Linux
: P2 normal
Assignee: Alex Cherepanov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-09-27 07:04 UTC by Marcos H. Woehrmann
Modified: 2008-12-19 08:31 UTC (History)
0 users

See Also:
Customer: 190
Word Size: ---


Attachments
memory_usage.png (16.69 KB, image/png)
2007-09-27 07:23 UTC, Marcos H. Woehrmann
Details
gs_frsd.ps (3.82 KB, application/postscript)
2007-10-10 23:01 UTC, Ray Johnston
Details
patch (11.10 KB, patch)
2007-10-13 16:02 UTC, Alex Cherepanov
Details | Diff
gs_frsd.ps (3.81 KB, application/postscript)
2008-05-06 12:55 UTC, Ray Johnston
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2007-09-27 07:04:06 UTC
The customer reports that Ghostscript takes a long time to process a PDF file
even at low (30 DPI) resolution.  I've verified that it does take a while (see
timing figures in next comment).

The file is large, so is available at: casper:/home/support/689476/ADA_250dpi.eps

The command line the customer is using:

gs -sDEVICE=tiff32nc -dEPSCrop -sOutputFile=a.tif -r30 -dNOPAUSE -f
./ADA_250dpi.eps -c quit
Comment 1 Marcos H. Woehrmann 2007-09-27 07:23:52 UTC
Created attachment 3413 [details]
memory_usage.png

The customer reports the following timing with gs8.60 (cpu type, memory
configuration, and operating system not reported):

 30 dpi - ~1:00
 72 dpi - ~1:30 (with non-banding device)
400 dpi -  1:37 (with banding device)


Timing on my Linux AMD 64 5600+ with gshead (r8252) (but gs8.53 and gs8.60 are
very similar):

 30 dpi -  0:33
300 dpi -  0:37

Timing on my iMac Core 2 2.8 GHz with gshead (r8252):

 30 dpi -  0:07
300 dpi -  0:09

Both of my machines are equipped with 2 Megs of memory and Ghostscript does use
~1 gig of memory (see attached screenshot of memory allocation).  I'm a bit
surprised by the sudden increase in memory usage; Ghostscript steadily uses
memory up until ~450 megs and then allocates ~700 megs in one shot.

Generally my Linux box and my iMac are about the same speed, so I'm not sure
why the large difference in this case.
Comment 2 Ray Johnston 2007-09-27 13:16:34 UTC
The slow speed could be due to VM thrashing on a machine that doesn't have
enough physical RAM. That would also explain day to day variation depending on
how much physical RAM is locked by other processes when gs is trying to run.

I'll take a look at the file to see what's causing the large allocation. I
do know that the jasper JPEG2000 decoder can be a hog.
Comment 3 Marcos H. Woehrmann 2007-09-27 15:16:02 UTC
When I ran the timing figures I monitored each machine to make sure it wasn't swapping (and I misspoke 
earlier, my Linux box has 4 gigs of RAM).

Running with the Luratech decoder instead of Jasper didn't change the timing nor the memory usage.
Comment 4 Alex Cherepanov 2007-09-30 22:08:01 UTC
How JPEG2000 issues can affect an EPS file ?

The file creates an reusable stream 800M long and uses it once as
a data source for an image - that's all.

Most of the time is spent in the memory management code.
On peeves the file takes about 60 min. I cannot imagine 1GB file
running in 7 sec.  - please recheck.

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 87.98   1438.23  1438.23 14221096     0.00     0.00  chunk_locate_ptr
  1.84   1468.23    30.00  5911904     0.00     0.00  gs_memset
  1.82   1497.96    29.73 10953039     0.00     0.00  gc_mark_string
  1.81   1527.50    29.54  5904410     0.00     0.00  gc_strings_set_reloc
  1.46   1551.37    23.87    20010     0.00     0.00  alloc_link_chunk
  1.38   1573.87    22.50  5904410     0.00     0.00  gc_strings_compact
  1.09   1591.66    17.79 87146253     0.00     0.00  alloc_open_chunk
  0.44   1598.90     7.24     5463     0.00     0.00  image_render_color
  0.34   1604.41     5.51   942057     0.00     0.00  s_A85D_process
  0.23   1608.21     3.80 87150761     0.00     0.00  alloc_close_chunk
  0.20   1611.46     3.25  1430700     0.00     0.00  gs_memcpy
  0.20   1614.65     3.19    13199     0.00     0.03  gc_trace
  0.16   1617.31     2.67 11026078     0.00     0.00  igc_reloc_string
  0.09   1618.71     1.40      880     0.00     1.78  gs_gc_reclaim
  0.08   1620.01     1.30  5904410     0.00     0.00  gc_objects_clear_marks
  0.08   1621.29     1.28  5904410     0.00     0.00  gc_objects_compact
  0.06   1622.31     1.02 28634739     0.00     0.00  cmap_cmyk_direct
  0.06   1623.32     1.01  8857440     0.00     0.00  igc_reloc_ref_ptr_nocheck
  0.06   1624.25     0.93  5904410     0.00     0.00  gc_do_reloc
Comment 5 Marcos H. Woehrmann 2007-10-01 07:28:55 UTC
Sorry, I wasn't clear; the times I reported in Comment #0 where in hours:minutes, so it takes 7 minutes 
on my iMac and the customer reported 60 minutes.

marcos@imac:[3]% time gs860 -sDEVICE=tiff32nc -dEPSCrop -sOutputFile=a.tif -r30 -dNOPAUSE -f 
./ADA_250dpi.eps -c quit
GPL Ghostscript 8.60 (2007-08-01)
Copyright (C) 2007 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
271.871u 20.342s 7:19.64 66.4%  0+0k 14+102io 0pf+0w
marcos@imac:[4]% 
 
Comment 6 Alex Cherepanov 2007-10-01 12:11:51 UTC
Our reusable stream implementation is highly inefficient.
First, the source file is read into an array of strings.
Second, a single buffer is allocated and the strings are copied there.

So the reusable steam takes 2x of core for a given stream length.

We need either read the stream directly to the buffer and realloc it as needed
or implement a stream that takes an array of strings for the storage.          
                                                                               
                                                                               
                                                                               
                                                                               
                
Comment 7 Ray Johnston 2007-10-02 09:45:54 UTC
Marcos is going to try 80000000 setvmthreshold to see if that helps on his
slow machine.

Assigning to Alex since he owns the gs_frsd.ps and can do the C implementation
to see if this can help.
Comment 8 Marcos H. Woehrmann 2007-10-02 09:56:37 UTC
Adding "-c 80000000 setvmthreshold" to the command line decreases the processing
time to just under 2 minutes on my Linux box and about 3 minutes on my iMac.
Comment 9 Marcos H. Woehrmann 2007-10-02 09:58:27 UTC
The complete command line referred to in comment #8:

time gs860 -sDEVICE=tiff32nc -dEPSCrop -sOutputFile=a.tif -r30 -dNOPAUSE -c
80000000 setvmthreshold -f ./ADA_250dpi.eps -c quit
Comment 10 Marcos H. Woehrmann 2007-10-02 10:01:34 UTC
Can we change the default vmthreshold to a larger value in Ghostscript?
Comment 11 Ray Johnston 2007-10-10 22:59:16 UTC
Replacement gs_frsd.ps attached that uses compression to build dict that is
used to determine the size of the bytestring (and save the contents for filling
the bytestring).

Further refinement is needed since this still creates the 840Mb bytestring as
uncompressed, but at least the peak memory usage is about 980Mb (the compressed
form of the data is only 92Mb).

I am able to run this file on my 2GHz Core 2 Duo w/ 2Gb RAM, generating
a 400 dpi png16m image in 416 sec, and a 30 dpi png16m can be created in 138 sec.

The entire command line I used was:

gswin32c -dBufferSpace=40000000 -sDEVICE=png16m -o x.png -dEPSCrop -r30 -c
"32000000 setvmthreshold" -f ADA_250dpi.eps

I recommend that the ReusableStream be implemented as blocks of compressed
data so that fileposition can be readily implemented and so we never need
to store uncompressed data. We may want to make a ReusableStreamEncode
filter that can create this for use by the ReusableStreamDecode filter.

Decision on implementation left to Alex.
Comment 12 Ray Johnston 2007-10-10 23:01:39 UTC
Created attachment 3463 [details]
gs_frsd.ps

Create temporary data in a dict of Flate compressed strings, then create a
bytestring of the uncompressed data as before.
Comment 13 Ray Johnston 2007-10-12 09:11:46 UTC
The customer reports that the modified gs_frsd.ps works for them to avoid the
VMerror.

Leaving the bug open until some method is committed.
Comment 14 Alex Cherepanov 2007-10-13 16:02:31 UTC
Created attachment 3469 [details]
patch

This is another approach - a reusable stream that uses an array of strings for
data storage. The patch passes both CET and Comparefiles testing.
Comment 15 Ray Johnston 2008-05-06 12:53:23 UTC
Comment on attachment 3463 [details]
gs_frsd.ps

This patch messed up the global state.
Comment 16 Ray Johnston 2008-05-06 12:55:31 UTC
Created attachment 3988 [details]
gs_frsd.ps

Corrected change to gs_frsd.ps workaround that doesn't mess up the global
allocation state.

Still waiting for Alex to complete the "real" fix.
Comment 17 Alex Cherepanov 2008-06-29 17:11:44 UTC
The patch from the comment #14 is committed as a rev. 8810.
http://ghostscript.com/pipermail/gs-cvs/2008-June/008392.html
Regression testing shows no differences.
Compression can be added later if needed.