The customer reports and I've verified that the attached file takes a very long time to process (hours) at 600 dpi. At 300 dpi it only takes a few seconds. Here is the command line I'm using for testing: bin/gs -sDEVICE=tiffpack -o test.if -r600 ./pro_2-3.pdf I've tried various options (i.e. setting MaxBitmap and/or BufferSpace) with no success.
We are spending most of our time in tile_by_steps with this file at 600dpi Function Name Inclusive Samples Exclusive Samples Inclusive Samples % Exclusive Samples % tile_by_steps 22,107 14,815 63.47 42.54 __ftol2_pentium4 3,372 3,372 9.68 9.68 _gs_debug_c 2,229 2,229 6.40 6.40 zoom_y 2,110 1,791 6.06 5.14 __ftol2_sse 1,273 1,273 3.65 3.65 @ILT+4030(_gs_debug_c) 1,071 1,071 3.07 3.07 _cmsTetrahedralInterp16 912 912 2.62 2.62 zoom_x 892 808 2.56 2.32 [ntdll.dll] 729 729 2.09 2.09 _gx_build_blended_image_row 532 532 1.53 1.53 interp 34,804 456 99.93 1.31
It may be that this is caused by the scan-conversion code, the file makes use of large paths. Re-assigning to Robin to test with the experimental scan-conversion code to see if this makes a difference to the processing time.
gprof results: % cumulative self self total time seconds seconds calls Ks/call Ks/call name 99.81 8551.52 8551.52 2978 0.00 0.00 tile_by_steps 0.04 8555.10 3.58 71819 0.00 0.00 s_IScale_process 0.02 8557.09 1.99 66174021 0.00 0.00 gs_memcpy 0.01 8558.20 1.11 21288 0.00 0.00 gx_build_blended_image_row 0.01 8559.02 0.82 55027299 0.00 0.00 cmsTetrahedralInterp16 0.01 8559.79 0.77 27374 0.00 0.00 image_render_mono_ht 0.01 8560.28 0.49 2747500 0.00 0.00 pdf14_fill_rectangle 0.01 8560.76 0.48 587 0.00 0.01 interp
The resolution sensitivity of this file is very peculiar. It may be that we are taking a slightly different path. Some times I collected showed us "falling off the cliff" (debug build times): DPI time 300 22 400 28 420 29 421 29 422 29 423 36 424 48 425 66 426 88 Because the increase in time is fairly flat from 200 to 300 to 422 dpi, we may be able to adjust the internal decisions to provide a temporary work around until the algorithm can be replaced. Note that the number of bands (or the band height) does _not_ affect the threshold. I played with -dBufferSpace and it only makes slight improvements and doesn't change the inflection point of the timing curve.
The gprof results from alex show that the interpolated scaler is being used. I suspect that 422dpi happens to be the point at which we start to scale an image up rather than down (or something) and so the scaler kicks in.
I analyzed this further with comments on IRC. The tile_by_steps is computing a large region to loop over in the tile. It doesn't seem to be at all related to the contents of the tile, so I doubt the image interpolation is the issue (a quick check with -dNOINTERPOLATE would confirm this, but I don't want to get off the trail that I am on). The 'ptile' bbox changes dramatically around that resolution, going from a bbox that is near zero to one that is offset by a lot, although the size of the tile in remains the same. This seems to confuse the u0, v0, u1, v1 calculations that use maz and min. Should find the root cause this AM.
I have found the problem and am testing a patch for it. Now my times are: 300 dpi 11 sec 400 dpi 19 sec 500 dpi 34 se 600 dpi 62 sec The problem was caused by a float2fixed conversion that overflowed causing a LARGE number of loops in tile_by_steps.
Fix looks good. Committed as: http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=ee45ee90570975275ad6b826b377d6852b651b6a