this applies to current HEAD on linux gs takes an exceedingly long time to display the attached pdf figure. acroread, xpdf, okular all display it swiftly. gdb shows the problem is somewhere in gx_default_fill_path() Cheers T.
Created attachment 6561 [details] pdf figure, which takes a very long time to display
I've verified that it takes ~3 minutes to convert the attached PDF file to a ppmraw file at 72 dpi using ghostscript head (r11553). This is a regression of sorts: r8425: 45 seconds r8426: Error: /unknownerror in --run-- . . . r11123: Error: /unknownerror in --run-- r11124: 172 seconds
Created attachment 6599 [details] NV400.pdf Cutdown version of the file that still shows the problem.
The cutdown file now consists of a rectangle filled with a pattern (P3). This pattern is in turned filled with a pattern (P7). P7 is in turn a single pixel stroked line.
Very Sleepy indicates that the time here is all being spent tiling patterns. gx_dc_pure_masked_fill_rect calls tile_fill_init, and tile_by_steps. These two functions call floor a lot (60% of runtime). The rest of the time is spent in tile_by_steps. (Take these times with a pinch of salt, as this was a debug build, and gs_debug_c accounts for a not insignificant amount of the profile).
Created attachment 6604 [details] times.png Using the original file I searched through the revs to find where the processing times increased. Note that these versions of Ghostscript all generate an error and no output, so it's not clear if the file is being processed correctly. 8490 45.270 8491 43.580 8492 43.910 8493 43.930 8494 32.530 8495 32.250 8496 44.850 8497 47.030 8498 46.900 8499 32.580 8500 63.400 8501 64.080 8502 63.340 8504 63.600 8505 63.790 8506 64.230 8507 64.550 8508 63.440 8509 64.880 8510 62.200 . . . 8680 60.860 8681 64.860 8682 65.150 8686 45.850 8689 65.470 8690 61.270 8691 45.640 8692 46.120 8693 62.740 8694 182.390 8695 187.780 8696 189.860 8697 182.700 8698 182.000 8699 189.410 8700 173.840 8704 188.740 8707 188.570 8708 188.850 8710 190.430 See the attached times.png for an overview.
(In reply to comment #6) > Using the original file I searched through the revs to find where the > processing times increased. Note that these versions of Ghostscript all > generate an error and no output, so it's not clear if the file is being > processed correctly. [snip] > 8691 45.640 > 8692 46.120 > 8693 62.740 > 8694 182.390 > 8695 187.780 [snip] revision 8694 changes a simple cast (truncation) of a calculated value to using floor instead, the cast was truncating -0.13 to 0 instead of -1. This ties in pretty closely with Robin's investigation showing that most of the time is spent in floor. Given what the code is doing, a simple addition/subtraction of 1 (depending on the sign) and then cast to an integer might be sufficient to solve the problem addressed by introducing floor, without too much of a processing penalty. I'll try this in a minute and see if it makes a difference.
(In reply to comment #7) > (In reply to comment #6) [snip] >I'll > try this in a minute and see if it makes a difference. Well, it actually made it slower :-( Possibly I broke it by accident. Still, the code in tile_by_steps has a comment : * This implementation could be sped up considerably! So perhaps Robin could perform some optimisation magic on it.
(In reply to comment #8) > >I'll > > try this in a minute and see if it makes a difference. > > Well, it actually made it slower :-( Possibly I broke it by accident. Yes, I broke it. A simpler fix improved the performance from 5'27 to 1'49 on my PC by removing the two floor operations added in rev 8694 and replacing with a test for x < 0 and y < 0 and decrementing them by 1 if so. This is not a replacement for revision 8694, but it serves to show where at least some of the problem lies. I'm sure there is more scope for improving the performance in this loop (precalculating the increment in x and y instead of recalculating on each loop iteration, for starters). Its still very slow....
One thing that occurred to me is that we could 'expand' the tile, as is done for halftone cells, for tiles that are really small. The 'super tile' would have the smaller tile repeated at the step intervals and should result in dividing the workload during painting the large area with the tile. I'm not sure what a reasonable expansion size would be -- the halftone logic only expands the halftone cells up to be at least 32 bits wide.
Created attachment 6606 [details] diff.txt Proposed change to fix the problem. Basically, this reverts to using (int)x rather than (int)(floor(x)), so removes the overhead of floor. We then check for x being negative, and only if it is, do we do the (relatively expensive) test to correct rounding. Also, some simple optimisations to reduce the floating point calculations done in loop.
Fix committed as revision 11599. Basically, we use a macro 'fastfloor' instead of calling floor. Local testing reveals that thisspeeds it up by a factor of 3 in debug builds (more in release builds as a large amount of time is spent if in the if_debugc calls in the loop). This basically gets us back to where we were before we regressed. There is still scope for improving the handling of this file further however, so I am downgrading this bug to an enhancement rather than closing it.
Created attachment 19434 [details] Patch Speed up rendering of patterns by using more efficient but mathematically equivalent calculations. The time of rendering of the sample file on my box decreased from 37.5 sec to 25.0 sec. I cannot find further improvements without major changes in the program structure or reimplementation with SSE. I believe that the bug can be closed now because further improvement is difficult and this kind of files is not common in the wild.