Bug 693166

Summary: Ghostscript locks up (or at least takes a very long time) converting PDF to TIFF
Product: Ghostscript Reporter: Marcos H. Woehrmann <marcos.woehrmann>
Component: ImagesAssignee: Robin Watts <robin.watts>
Status: NOTIFIED FIXED    
Severity: normal CC: alex, jackie.rosen, roucaries.bastien+gs
Priority: P2    
Version: master   
Hardware: PC   
OS: All   
Customer: 700 Word Size: ---

Description Marcos H. Woehrmann 2012-07-02 21:08:47 UTC
Converting the attached PDF to a group 4 TIFF @ 400 dpi causes ghostscript to lock up (or at least to take longer than I'm willing to wait).  The command line I'm using for testing:

  bin/gs -sDEVICE=tiffg4 -r400 -o test.tif ./PW-880b.pdf

Saving the file from Acrobat as a 400 dpi monochrome TIFF takes approximately 7 seconds.  Similarly converting the file using mudraw to a 400 dpi pbm take 7.5 seconds.
Comment 2 Alex Cherepanov 2012-07-03 15:25:40 UTC
Ghostscript renders this file much faster with -dNOINTERPOLATE flag.
OTOH, -dNOTRANSPARENCY flag doesn't affect rendering time in this case.
Comment 3 Robin Watts 2012-07-06 15:37:58 UTC
This file contains lots of images, which are scaled up.

The scaling has to process roughly 6 Million input pixels and output 30 Billion output pixels.

The reason MuPDF renders so quickly in comparison is that it does not interpolate when upscaling images.
Comment 4 Robin Watts 2012-07-18 00:28:34 UTC
Fixed in:


commit a88326f1ca382092c889ffa9be1abe857c118a34
Author: Robin Watts <robin.watts@artifex.com>
Date:   Fri Jul 13 13:05:06 2012 +0100

    Bug 693166: Optimise interpolation

    When interpolating, ghostscript pays no heed to the clipping rectangle.
    Hence if we scale (say) a 256x256 image up to (say)17067x17067, even
    though only a small portion of the scaled up image is actually visible
    we scale the whole lot only to throw away 90%+ of it.

    To fix this, we have to extend the capabilities of the interpolation
    code.

    The existing code already copes with only being given data for a
    subsection of the image (for when we split images in the clist, I
    guess). This rectangle is referred to in the code as being 'the
    subrectangle we are rendering', when it's actually 'the subtrectangle
    we are being given data for'. We update the description to be more
    accurate.

    We introduce a new rectangle, 'the render rectangle' to indicate the
    subrectangle that we are actually rendering - this will always be a
    subset of the data rectangle.

    If we are given a clipping rectangle, we read the outer bbox from it,
    and map this back into the source space of the image; we intersect this
    with the data rectangle to get the render rectangle.

    We update the scaling stream filter to set an 'Active' flag to say
    whether we are inside the render rectangle or not. If not, we can
    safely skip lines in their totality. By default we leave this set to
    1, so that any scaling cores that aren't updated to know about this
    will perform in the old way.

    We update the scaling code to make use of the Active flag; whole lines
    are skipped if we aren't in the active region, and if we are, we skip
    prefixes/suffixes of unused pixels.

    We update the scaling cores themselves to avoid calculating values
    outside the active regions.

    Note that for simplicity we still allocate space as if we were
    accessing the whole line. We still calculate contributions for the whole
    of the images; to do otherwise would require significant changes to
    the weight generation code, and this isn't a huge consumer of time.
Comment 5 Chris Liddell (chrisl) 2012-07-23 10:07:25 UTC
*** Bug 693209 has been marked as a duplicate of this bug. ***
Comment 6 Ray Johnston 2012-10-20 18:19:40 UTC
Customer reports that this issue is not resolved with 9.06, and I am able
to replicate it.

With the default BufferSpace we end up with a BandHeight=12 and 1000 bands.
With this setting it takes a REALLY LONG TIME (I gave up after 15 minutes).

Increasing to -dBufferSpace=32m we get a BandHeight of 105 and 115 bands and
we can render the page in 2m39 sec (my 2.8GHz i7 laptop).

With -dNumRenderingThreads=4 the time further decreases to 78 seconds (and the
fan goes into high gear!)

However, from my work on performance with customer 532, I've noticed that when
we interpolate images post-clist the 'support' lines can really increase the
computation load, so interpolating prior to the clist can speed things up.

With this file adding the following patch:
--- a/gs/base/gxclimag.c
+++ b/gs/base/gxclimag.c
@@ -448,6 +448,9 @@ clist_begin_typed_image(gx_device * dev,
         base_index != gs_color_space_index_ICC)
         /****** Can only handle Gray, RGB, CMYK and ICC ******/
         goto use_default;
+    if (pim->Interpolate)
+        /****** Interpolated images in clist mode end up wasting time ******/
+        goto use_default;
     if (has_alpha)
         /****** CAN'T HANDLE IMAGES WITH ALPHA YET ******/
         goto use_default;

Increases the parsing time from 0.4 seconds to 3.8 seconds, but then we
can complete the rendering in under 3 seconds for a total time of 6.2 sec.

What I don't understand is why clist mode is so much worse. The 'support'
is supposed to be 8 lines and the images are all within 23 out of 115 bands,
so why the clist mode is working so much harder doing the interpolation is
unexpected.

I recommend giving this customer the above patch for now.
Comment 7 Robin Watts 2012-10-22 10:40:04 UTC
Ray added some more about this bug on irc:

> holy cow. I think I see the problem. This customer's file is scaling up
> a LOT (by 266 in x 1150 in y), so we are calling 'image_render_interpolate'
> 46,260 times and generating 5,071,226 LINES of interpolated data (with 115
> bands). With 57 bands, we only call image_render_interpolate 28.786 times
> and generate 2,603,385 lines of interpolated data. Interpolating during
> the parse phase has 13,220 calls and generates 622,130 lines
>
> so the 'support' is adding resulting in roughly 250 extra 'support' lines
> per band for the page (among the various images)
> extrapolating to the 1000 bands with the default BufferSpace I would guess
> that on my laptop, it would take 15 minutes. I may have given up and killed
> the job right before it finished. I'll try it on peeves
>
> wow. My laptop ran the job in 159 seconds with 32m buffer. It takes 278
> seconds on peeves. I'll have to check on the 1000 band job later...
>
> BTW, the comment above about the images only being in 23 bands isn't
> correct. With the extensive scaling the images are covering the whole
> page -- it's just the visible portion that is in 23/115 bands
Comment 8 Robin Watts 2012-10-22 11:13:11 UTC
If memory serves, this file contains output from something like google maps; tiles of images.

Each image is 256x256 or thereabouts, and each grid square on the output map is made up by first scaling the 'level 0' images up 1<<n times and clipping it with a rectangular clipping path set to the grid square. Then the 'level 1' image is scaled up 1<<(n-1) times and again clipped to the same clipping path. Then again and again until we have the 'level n' image scaled up 1<<0 times.

This is then repeated for the every grid square in the image.

What our code has to do therefore is to calculate which source regions of the image need to be mapped. To do this we take the bounding box of the clip rectangle and map them back to the source coordinates (to integer source coordinates). To ensure that we do not clip too much, we always round outwards.

As we are scaling up a lot (for some of the images at least), increasing the size of the support by only 1 will generate an extra 1<<n output interpolated lines.

By introducing banding into the mix, each grid square is vertically subdivided, causing more 'edges' and hence more extra lines to be generated.

This is a pathological file, so spending too much time on it seems pointless.

One thing worth thinking about perhaps; most devices have a maximum 8 bit accuracy in their output pixels. For these devices it is never worth interpolating by more than 256 times in a given direction, as we can never represent the intermediate values that might be produced. I'm not sure how to actually introduce this restriction into the code though.
Comment 9 Robin Watts 2012-11-05 21:40:41 UTC
The following commit reduces the time for this file from 16 minutes to 5mins 35 secs.

commit 048b221e76fba80663f073f8312802bcdf168c52
Author: Robin Watts <robin.watts@artifex.com>
Date:   Mon Nov 5 21:30:48 2012 +0000

    Bug 693166: Speed images through the clist.

    In investigating bug 693166 Ray spotted that the calculation of which
    bands were touched by images was slack in the presence of a clipping
    path - he proposed a simple patch to fix this.

    Unfortunately it had a knock on effect where vertical offsets could be
    introduced into the topmost band. This is fixed here by a second small
    change in image_band_box.

    Cluster testing shows 10 small changes, all well within the usual
    clist differences.