Bug 691504 - gs takes an exceedingly long time to display the attached pdf file
Summary: gs takes an exceedingly long time to display the attached pdf file
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Interpreter (show other bugs)
Version: master
Hardware: PC Linux
: P4 enhancement
Assignee: Robin Watts
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-27 18:13 UTC by keinbiervorvier
Modified: 2020-12-07 17:12 UTC (History)
2 users (show)

See Also:
Customer:
Word Size: ---


Attachments
pdf figure, which takes a very long time to display (195.41 KB, application/pdf)
2010-07-27 18:15 UTC, keinbiervorvier
Details
NV400.pdf (250.02 KB, application/pdf)
2010-08-03 13:56 UTC, Robin Watts
Details
times.png (5.04 KB, image/png)
2010-08-03 19:31 UTC, Marcos H. Woehrmann
Details
diff.txt (1.64 KB, patch)
2010-08-04 17:55 UTC, Robin Watts
Details | Diff
Patch (2.72 KB, patch)
2020-07-10 21:31 UTC, Peter Cherepanov
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description keinbiervorvier 2010-07-27 18:13:17 UTC
this applies to current HEAD on linux

gs takes an exceedingly long time to display the attached pdf figure. acroread, xpdf, okular all display it swiftly.

gdb shows the problem is somewhere in gx_default_fill_path()


Cheers
T.
Comment 1 keinbiervorvier 2010-07-27 18:15:01 UTC
Created attachment 6561 [details]
pdf figure, which takes a very long time to display
Comment 2 Marcos H. Woehrmann 2010-07-30 00:38:26 UTC
I've verified that it takes ~3 minutes to convert the attached PDF file to a ppmraw file at 72 dpi using ghostscript head (r11553).  This is a regression of sorts:

r8425: 45 seconds
r8426: Error: /unknownerror in --run--
.
.
.
r11123: Error: /unknownerror in --run--
r11124: 172 seconds
Comment 3 Robin Watts 2010-08-03 13:56:48 UTC
Created attachment 6599 [details]
NV400.pdf

Cutdown version of the file that still shows the problem.
Comment 4 Robin Watts 2010-08-03 13:59:53 UTC
The cutdown file now consists of a rectangle filled with a pattern (P3). This pattern is in turned filled with a pattern (P7). P7 is in turn a single pixel stroked line.
Comment 5 Robin Watts 2010-08-03 19:14:08 UTC
Very Sleepy indicates that the time here is all being spent tiling patterns.

gx_dc_pure_masked_fill_rect calls tile_fill_init, and tile_by_steps.
These two functions call floor a lot (60% of runtime). The rest of the time is spent in tile_by_steps.

(Take these times with a pinch of salt, as this was a debug build, and gs_debug_c accounts for a not insignificant amount of the profile).
Comment 6 Marcos H. Woehrmann 2010-08-03 19:31:21 UTC
Created attachment 6604 [details]
times.png

Using the original file I searched through the revs to find where the processing times increased.  Note that these versions of Ghostscript all generate an error and no output, so it's not clear if the file is being processed correctly.

8490    45.270
8491    43.580
8492    43.910
8493    43.930
8494    32.530
8495    32.250
8496    44.850
8497    47.030
8498    46.900
8499    32.580
8500    63.400
8501    64.080
8502    63.340
8504    63.600
8505    63.790
8506    64.230
8507    64.550
8508    63.440
8509    64.880
8510    62.200
.
.
.
8680    60.860
8681    64.860
8682    65.150
8686    45.850
8689    65.470
8690    61.270
8691    45.640
8692    46.120
8693    62.740
8694    182.390
8695    187.780
8696    189.860
8697    182.700
8698    182.000
8699    189.410
8700    173.840
8704    188.740
8707    188.570
8708    188.850
8710    190.430

See the attached times.png for an overview.
Comment 7 Ken Sharp 2010-08-04 07:11:57 UTC
(In reply to comment #6)

> Using the original file I searched through the revs to find where the
> processing times increased.  Note that these versions of Ghostscript all
> generate an error and no output, so it's not clear if the file is being
> processed correctly.

[snip]

> 8691    45.640
> 8692    46.120
> 8693    62.740
> 8694    182.390
> 8695    187.780

[snip]

revision 8694 changes a simple cast (truncation) of a calculated value to using floor instead, the cast was truncating -0.13 to 0 instead of -1. This ties in pretty closely with Robin's investigation showing that most of the time is spent in floor.

Given what the code is doing, a simple addition/subtraction of 1 (depending on the sign) and then cast to an integer might be sufficient to solve the problem addressed by introducing floor, without too much of a processing penalty. I'll try this in a minute and see if it makes a difference.
Comment 8 Ken Sharp 2010-08-04 07:59:37 UTC
(In reply to comment #7)
> (In reply to comment #6)

[snip]

>I'll
> try this in a minute and see if it makes a difference.

Well, it actually made it slower :-( Possibly I broke it by accident.

Still, the code in tile_by_steps has a comment :

 * This implementation could be sped up considerably!

So perhaps Robin could perform some optimisation magic on it.
Comment 9 Ken Sharp 2010-08-04 08:21:29 UTC
(In reply to comment #8)

> >I'll
> > try this in a minute and see if it makes a difference.
> 
> Well, it actually made it slower :-( Possibly I broke it by accident.

Yes, I broke it. A simpler fix improved the performance from 5'27 to 1'49 on my PC by removing the two floor operations added in rev 8694 and replacing with a test for x < 0 and y < 0 and decrementing them by 1 if so. This is not a replacement for revision 8694, but it serves to show where at least some of the problem lies.

I'm sure there is more scope for improving the performance in this loop (precalculating the increment in x and y instead of recalculating on each loop iteration, for starters).

Its still very slow....
Comment 10 Ray Johnston 2010-08-04 16:29:30 UTC
One thing that occurred to me is that we could 'expand' the tile, as is done
for halftone cells, for tiles that are really small. The 'super tile' would
have the smaller tile repeated at the step intervals and should result in
dividing the workload during painting the large area with the tile. I'm not
sure what a reasonable expansion size would be -- the halftone logic only
expands the halftone cells up to be at least 32 bits wide.
Comment 11 Robin Watts 2010-08-04 17:55:06 UTC
Created attachment 6606 [details]
diff.txt

Proposed change to fix the problem.

Basically, this reverts to using (int)x rather than (int)(floor(x)), so removes the overhead of floor. We then check for x being negative, and only if it is, do we do the (relatively expensive) test to correct rounding.

Also, some simple optimisations to reduce the floating point calculations done in loop.
Comment 12 Robin Watts 2010-08-04 23:20:06 UTC
Fix committed as revision 11599. Basically, we use a macro 'fastfloor' instead of calling floor. Local testing reveals that thisspeeds it up by a factor of 3 in debug builds (more in release builds as a large amount of time is spent if in the if_debugc calls in the loop).

This basically gets us back to where we were before we regressed.

There is still scope for improving the handling of this file further however, so I am downgrading this bug to an enhancement rather than closing it.
Comment 13 Peter Cherepanov 2020-07-10 21:31:43 UTC
Created attachment 19434 [details]
Patch

Speed up rendering of patterns by using more efficient but mathematically
equivalent calculations.

The time of rendering of the sample file on my box decreased from 37.5 sec to 25.0 sec. I cannot find further improvements without major changes in the program structure or reimplementation with SSE.

I believe that the bug can be closed now because further improvement is difficult and this kind of files is not common in the wild.