Bug 691612 - Regression: Performance regression with 9.00rc3 vs 8.70
Summary: Regression: Performance regression with 9.00rc3 vs 8.70
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Regression (show other bugs)
Version: 0.00
Hardware: PC Linux
: P2 normal
Assignee: Robin Watts
URL:
Keywords:
: 691613 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-09-15 16:22 UTC by Henk Jan Priester
Modified: 2010-09-23 16:20 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
pdf used for this test (deleted)
2010-09-15 16:22 UTC, Henk Jan Priester
Details
screen dump of the callgrind output (251.24 KB, image/png)
2010-09-22 07:28 UTC, Henk Jan Priester
Details
patch.txt (3.31 KB, patch)
2010-09-22 19:19 UTC, Robin Watts
Details | Diff
patch2.txt (984 bytes, patch)
2010-09-22 19:21 UTC, Robin Watts
Details | Diff
patch3.txt (2.85 KB, patch)
2010-09-22 19:24 UTC, Robin Watts
Details | Diff
combined_patch.txt (7.09 KB, patch)
2010-09-23 15:10 UTC, Robin Watts
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Henk Jan Priester 2010-09-15 16:22:06 UTC
When rasterizing a large PDF with transparancy using the following commandline:

time $GS_DIR/bin/gs -dNOPAUSE -dBATCH -dBufferSpace=50000000 -r600 -sDEVICE=png16m -sOutputFile=strike.png strikeline1.pdf

GhostScript 9.00rc3 is very slow compared to 8.70. (113 vs 8.5 minutes)

GPL Ghostscript 8.70 (2009-07-31)
Copyright (C) 2009 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1

real    8m31.000s
user    8m10.277s
sys     0m19.870s

GPL Ghostscript RELEASE CANDIDATE 9.00 (2010-09-14)
Copyright (C) 2010 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1

real    113m6.393s
user    112m25.194s
sys     0m25.733s
Comment 1 Henk Jan Priester 2010-09-15 16:45:51 UTC
adding the attachment failed with a server error
proably the file was too big. (78015376 bytes)
Comment 2 Henk Jan Priester 2010-09-15 16:46:52 UTC
*** Bug 691613 has been marked as a duplicate of this bug. ***
Comment 3 Marcos H. Woehrmann 2010-09-16 03:21:29 UTC
Please use a file transfer service such as:

https://www.transferbigfiles.com/

or

https://www.yousendit.com/

to send the large file to support@artifex.com (alternatively you can put the file on your web server and email us the URL).
Comment 4 Henk Jan Priester 2010-09-16 06:42:12 UTC
I have uploaded the file using: https://www.transferbigfiles.com/
the email for the recipient is set to support@artifex.com
Comment 5 Marcos H. Woehrmann 2010-09-16 15:04:10 UTC
I'm still verifying, but it looks like this was caused by:

r10224 | robin | 2009-10-25 05:39:53 -0700 (Sun, 25 Oct 2009) | 4 lines

Fix for bug 687295; force transparent stroking to construct the entire path
before stroking it, thus eliminating nasty effects where edges overlap.
Comment 6 Marcos H. Woehrmann 2010-09-16 15:05:23 UTC
I've temporarily put the test file on casper in /home/support/691612 (spectre.ghostscript.com appears to be down).
Comment 7 Marcos H. Woehrmann 2010-09-16 22:57:29 UTC
I've verified that the performance changed with r10224:

10223  3:29.23
10224 35:22.16
Comment 8 Henk Jan Priester 2010-09-17 06:51:04 UTC
thanks for looking into this. The result of the profiling I did also yesterday shows that 84% of the time is spend in gx_default_fill_path
(see the attached png)
Comment 9 Robin Watts 2010-09-21 13:45:27 UTC
(In reply to comment #8)
> thanks for looking into this. The result of the profiling I did also yesterday
> shows that 84% of the time is spend in gx_default_fill_path
> (see the attached png)

I see no attached png - could you reattach please?

The change that caused this slowdown was done as a fix for bug#687295. It was done as a fix for lines stroked with transparency. The old code used to stroke each individual section of the line, and then fill it. The new code delayed the fill until the line had been entirely stroked.

Without this change pixels that were covered by multiple sections of the lines (such as where one part of the line intersects with the next) were being altered multiple times, resulting in the 'joins' within stroked lines looking overly dark.

This is in fact done for all non idempotent plotting operations, not just transparency. I am trying to understand the exact reason why it is triggering in this case now.
Comment 10 Henk Jan Priester 2010-09-22 07:28:13 UTC
Created attachment 6742 [details]
screen dump of the callgrind output
Comment 11 Robin Watts 2010-09-22 15:41:53 UTC
Thanks. It seems that the PDF interpreter scans the file, and spots that transparency is used on that page. It therefore inserts a pdf14 device (the internal mechanism gs uses to handle transparency).

The presence of this device is enough to force the strokes to take the slow path through the code, even though they themselves are not transparent.

We are looking into solving this. The current plan is to:

 * firstly, make clist band playback call the vectored stroke_path func rather than calling direct into the default implementation. This will ensure that pdf14_stroke_path is called.

 * pdf14_stroke_path will check to see if the alpha is set to full (and any other blend mode things that need to be checked). If it detects that the stroke path can be done without needing to take the slow path through the code it will rewrite the lop to pretend that the pdf14 device isn't there.

 * In investigating this, I think I've found a problem with the lop_is_idempotent test. Testing now.
Comment 12 Robin Watts 2010-09-22 19:19:14 UTC
Created attachment 6749 [details]
patch.txt

Fix for idempotency test. Confirmed with lpd. Causes 78 differences in regression testing. Will be committed as soon as I can get a bmpcmp test to complete to examine them.
Comment 13 Ray Johnston 2010-09-22 19:20:08 UTC
Also bug 691607 will reduce the performance impact of the 'page level'
transparency when pages have lots of painting that are unaffected by an
effective transparancy action.
Comment 14 Robin Watts 2010-09-22 19:21:48 UTC
Created attachment 6750 [details]
patch2.txt

Patch to clist band playback code to call vectored stroke code, rather than direct call to default code. Need to check this with Ray.
Comment 15 Robin Watts 2010-09-22 19:24:36 UTC
Created attachment 6751 [details]
patch3.txt

Change PDF14 device to only set the lop flag indicating that pdf14 is in use if a non opaque normal blend is being done. Check with Michael/Ray.
Comment 16 Robin Watts 2010-09-23 15:09:26 UTC
All 3 parts are now committed as revisions 11736, 11737, 11738 respectively. I'll do some timings to verify this really solves the problem and then close the bug.

In my local tests using the same command line as below (but at the default 96dpi) on windows:

 Pre-8.71 took 44 seconds.
 8.71 took 3 mins 37s.
 The latest code took 45 seconds.

It should be easy enough to apply these revisions to your version of gs. I'll attach a combined patch for your convenience.
Comment 17 Robin Watts 2010-09-23 15:10:50 UTC
Created attachment 6753 [details]
combined_patch.txt

Combined patch to solve the bug. Should apply to 8.71 or 9.00 I hope.
Comment 18 Henk Jan Priester 2010-09-23 16:20:15 UTC
patch -p1 < ../combined_patch.txt
patching file base/gdevp14.c
Hunk #1 succeeded at 1648 (offset -87 lines).
Hunk #3 succeeded at 1744 (offset -87 lines).
Hunk #4 succeeded at 6043 (offset -18 lines).
Hunk #5 succeeded at 6024 (offset -87 lines).
patching file base/gxclrast.c
Hunk #1 FAILED at 1804.
1 out of 1 hunk FAILED -- saving rejects to file base/gxclrast.c.rej
patching file base/gsropt.h

I modified base/gxclrast.c manualy.

Compiling/Linking went oke. Althought there are 2 warning about gdevp14.c

gcc  -DHAVE_MKSTEMP -DHAVE_HYPOT -DHAVE_FILE64 -DHAVE_MKSTEMP64 -DHAVE_FONTCONFI
G  -DHAVE_SETLOCALE -O2 -Wall -Wstrict-prototypes -Wundef -Wmissing-declarations
 -Wmissing-prototypes -Wwrite-strings -Wno-strict-aliasing -Wdeclaration-after-s
tatement -fno-builtin -fno-common -DHAVE_STDINT_H -DGX_COLOR_INDEX_TYPE="unsigne
d long long"   -I./obj -I./base  -o ./obj/gdevp14.o -c ./base/gdevp14.c
./base/gdevp14.c: In function `pdf14_tile_pattern_fill':
./base/gdevp14.c:1963: warning: passing arg 1 of `pdf14_pop_transparency_group' discards qualifiers from pointer target type ./base/gdevp14.c: In function `pdf14_patt_trans_image_fill': ./base/gdevp14.c:2067: warning: passing arg 4 of `compute_group_device_int_rect' discards qualifiers from pointer target type

On the computer I used for reporting this problem the
time was now: 11m36 vs the 8m30 from 8.72 which is in line which other
files I played with. 9.00 is up to 25% slower then 8.70.
It is much better then the 113 minutes I first got, thanks for solving
the problem.