Converting the attached PDF to a group 4 TIFF @ 400 dpi causes ghostscript to lock up (or at least to take longer than I'm willing to wait). The command line I'm using for testing: bin/gs -sDEVICE=tiffg4 -r400 -o test.tif ./PW-880b.pdf Saving the file from Acrobat as a 400 dpi monochrome TIFF takes approximately 7 seconds. Similarly converting the file using mudraw to a 400 dpi pbm take 7.5 seconds.
Ghostscript renders this file much faster with -dNOINTERPOLATE flag. OTOH, -dNOTRANSPARENCY flag doesn't affect rendering time in this case.
This file contains lots of images, which are scaled up. The scaling has to process roughly 6 Million input pixels and output 30 Billion output pixels. The reason MuPDF renders so quickly in comparison is that it does not interpolate when upscaling images.
Fixed in: commit a88326f1ca382092c889ffa9be1abe857c118a34 Author: Robin Watts <robin.watts@artifex.com> Date: Fri Jul 13 13:05:06 2012 +0100 Bug 693166: Optimise interpolation When interpolating, ghostscript pays no heed to the clipping rectangle. Hence if we scale (say) a 256x256 image up to (say)17067x17067, even though only a small portion of the scaled up image is actually visible we scale the whole lot only to throw away 90%+ of it. To fix this, we have to extend the capabilities of the interpolation code. The existing code already copes with only being given data for a subsection of the image (for when we split images in the clist, I guess). This rectangle is referred to in the code as being 'the subrectangle we are rendering', when it's actually 'the subtrectangle we are being given data for'. We update the description to be more accurate. We introduce a new rectangle, 'the render rectangle' to indicate the subrectangle that we are actually rendering - this will always be a subset of the data rectangle. If we are given a clipping rectangle, we read the outer bbox from it, and map this back into the source space of the image; we intersect this with the data rectangle to get the render rectangle. We update the scaling stream filter to set an 'Active' flag to say whether we are inside the render rectangle or not. If not, we can safely skip lines in their totality. By default we leave this set to 1, so that any scaling cores that aren't updated to know about this will perform in the old way. We update the scaling code to make use of the Active flag; whole lines are skipped if we aren't in the active region, and if we are, we skip prefixes/suffixes of unused pixels. We update the scaling cores themselves to avoid calculating values outside the active regions. Note that for simplicity we still allocate space as if we were accessing the whole line. We still calculate contributions for the whole of the images; to do otherwise would require significant changes to the weight generation code, and this isn't a huge consumer of time.
*** Bug 693209 has been marked as a duplicate of this bug. ***
Customer reports that this issue is not resolved with 9.06, and I am able to replicate it. With the default BufferSpace we end up with a BandHeight=12 and 1000 bands. With this setting it takes a REALLY LONG TIME (I gave up after 15 minutes). Increasing to -dBufferSpace=32m we get a BandHeight of 105 and 115 bands and we can render the page in 2m39 sec (my 2.8GHz i7 laptop). With -dNumRenderingThreads=4 the time further decreases to 78 seconds (and the fan goes into high gear!) However, from my work on performance with customer 532, I've noticed that when we interpolate images post-clist the 'support' lines can really increase the computation load, so interpolating prior to the clist can speed things up. With this file adding the following patch: --- a/gs/base/gxclimag.c +++ b/gs/base/gxclimag.c @@ -448,6 +448,9 @@ clist_begin_typed_image(gx_device * dev, base_index != gs_color_space_index_ICC) /****** Can only handle Gray, RGB, CMYK and ICC ******/ goto use_default; + if (pim->Interpolate) + /****** Interpolated images in clist mode end up wasting time ******/ + goto use_default; if (has_alpha) /****** CAN'T HANDLE IMAGES WITH ALPHA YET ******/ goto use_default; Increases the parsing time from 0.4 seconds to 3.8 seconds, but then we can complete the rendering in under 3 seconds for a total time of 6.2 sec. What I don't understand is why clist mode is so much worse. The 'support' is supposed to be 8 lines and the images are all within 23 out of 115 bands, so why the clist mode is working so much harder doing the interpolation is unexpected. I recommend giving this customer the above patch for now.
Ray added some more about this bug on irc: > holy cow. I think I see the problem. This customer's file is scaling up > a LOT (by 266 in x 1150 in y), so we are calling 'image_render_interpolate' > 46,260 times and generating 5,071,226 LINES of interpolated data (with 115 > bands). With 57 bands, we only call image_render_interpolate 28.786 times > and generate 2,603,385 lines of interpolated data. Interpolating during > the parse phase has 13,220 calls and generates 622,130 lines > > so the 'support' is adding resulting in roughly 250 extra 'support' lines > per band for the page (among the various images) > extrapolating to the 1000 bands with the default BufferSpace I would guess > that on my laptop, it would take 15 minutes. I may have given up and killed > the job right before it finished. I'll try it on peeves > > wow. My laptop ran the job in 159 seconds with 32m buffer. It takes 278 > seconds on peeves. I'll have to check on the 1000 band job later... > > BTW, the comment above about the images only being in 23 bands isn't > correct. With the extensive scaling the images are covering the whole > page -- it's just the visible portion that is in 23/115 bands
If memory serves, this file contains output from something like google maps; tiles of images. Each image is 256x256 or thereabouts, and each grid square on the output map is made up by first scaling the 'level 0' images up 1<<n times and clipping it with a rectangular clipping path set to the grid square. Then the 'level 1' image is scaled up 1<<(n-1) times and again clipped to the same clipping path. Then again and again until we have the 'level n' image scaled up 1<<0 times. This is then repeated for the every grid square in the image. What our code has to do therefore is to calculate which source regions of the image need to be mapped. To do this we take the bounding box of the clip rectangle and map them back to the source coordinates (to integer source coordinates). To ensure that we do not clip too much, we always round outwards. As we are scaling up a lot (for some of the images at least), increasing the size of the support by only 1 will generate an extra 1<<n output interpolated lines. By introducing banding into the mix, each grid square is vertically subdivided, causing more 'edges' and hence more extra lines to be generated. This is a pathological file, so spending too much time on it seems pointless. One thing worth thinking about perhaps; most devices have a maximum 8 bit accuracy in their output pixels. For these devices it is never worth interpolating by more than 256 times in a given direction, as we can never represent the intermediate values that might be produced. I'm not sure how to actually introduce this restriction into the code though.
The following commit reduces the time for this file from 16 minutes to 5mins 35 secs. commit 048b221e76fba80663f073f8312802bcdf168c52 Author: Robin Watts <robin.watts@artifex.com> Date: Mon Nov 5 21:30:48 2012 +0000 Bug 693166: Speed images through the clist. In investigating bug 693166 Ray spotted that the calculation of which bands were touched by images was slack in the presence of a clipping path - he proposed a simple patch to fix this. Unfortunately it had a knock on effect where vertical offsets could be introduced into the topmost band. This is fixed here by a second small change in image_band_box. Cluster testing shows 10 small changes, all well within the usual clist differences.