Bug 690531 - Excessive slowdown with -dGraphicsAlphaBits=2
Summary: Excessive slowdown with -dGraphicsAlphaBits=2
Status: NOTIFIED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Graphics Library (show other bugs)
Version: 0.00
Hardware: PC Linux
: P2 normal
Assignee: Robin Watts
URL:
Keywords:
: 688363 688607 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-06-11 10:16 UTC by Marcos H. Woehrmann
Modified: 2011-09-18 21:48 UTC (History)
2 users (show)

See Also:
Customer: 581
Word Size: ---


Attachments
bug_690531_simple.pdf (812.18 KB, application/pdf)
2009-09-10 10:38 UTC, Michael Vrhel
Details
slowpath.ps (14.74 KB, application/postscript)
2009-10-28 08:19 UTC, Robin Watts
Details
slowpath2.ps (723.71 KB, application/postscript)
2009-10-28 08:21 UTC, Robin Watts
Details
bug_690531_simple2.pdf (812.18 KB, application/pdf)
2009-10-28 10:45 UTC, Robin Watts
Details
bug_690531_simple3.pdf (812.18 KB, application/pdf)
2009-10-28 10:46 UTC, Robin Watts
Details
slowpath_dump.ps (15.13 KB, application/postscript)
2009-11-05 09:25 UTC, Henry Stiles
Details
patch (28.68 KB, patch)
2009-11-17 07:40 UTC, Robin Watts
Details | Diff
patch2 (37.39 KB, patch)
2009-11-18 09:23 UTC, Robin Watts
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2009-06-11 10:16:09 UTC
The customer reports and I've verified that the attached file renders >60x
slower when adding -dGraphicsAlphaBits=2 to the command line.  This is with head
(r9788), but gs8.64 behaves similarly.

The command lines I'm using:

  bin/gs -sDEVICE=tiff24nc -r300 -o test.tif slow.pdf

and

  bin/gs -sDEVICE=tiff24nc -r300 -dGraphicsAlphaBits=2 -o test.tif slow.pdf

I've tried adding -dMaxBitmap and -K options to the command line and they do not
appear to affect the speed.
Comment 1 Marcos H. Woehrmann 2009-06-11 10:17:00 UTC
Created attachment 5088 [details]
slow.pdf
Comment 2 Ray Johnston 2009-06-18 11:19:30 UTC
*** Bug 688607 has been marked as a duplicate of this bug. ***
Comment 3 Henry Stiles 2009-06-23 13:24:58 UTC
without alpha:

	26.3%	26.3%	gs	clip_enumerate_rest	
	7.9%	7.9%	gs	gs_interpret	
	5.7%	5.7%	gs	gx_default_fill_trapezoid	
	5.3%	5.3%	gs	mem_true24_fill_rectangle	
	3.7%	3.7%	gs	gx_general_fill_path	
	2.8%	2.8%	gs	spot_into_trapezoids__aj_fd	
	2.5%	2.5%	gs	dstack_find_name_by_index	
	1.8%	1.8%	gs	intersect_al	
	1.5%	1.5%	gs	fixed_mult_quo	
	1.5%	1.5%	gs	clip_call_fill_rectangle	
	1.4%	1.4%	gs	gx_flattened_iterator__init_line	
	1.4%	1.4%	gs	jpeg_idct_islow	
	1.4%	1.4%	gs	gx_flattened_iterator__next	
	1.4%	1.4%	gs	clip_fill_rectangle	
	1.2%	1.2%	gs	inflate_fast	
	1.1%	1.1%	gs	i_free_object	
	1.1%	1.1%	gs	gx_device_init	
	1.1%	1.1%	gs	scan_token	
	1.0%	1.0%	gs	step_al	
	1.0%	1.0%	gs	ycck_cmyk_convert	
	1.0%	1.0%	gs	gx_stroke_path_only	
	0.9%	0.9%	gs	gx_path_add_lines_notes	
	0.8%	0.8%	gs	array_get	
	0.7%	0.7%	gs	image_render_color	
	0.6%	0.6%	mach_kernel	ml_set_interrupts_enabled	
	0.6%	0.6%	gs	i_stable	
	0.6%	0.6%	gs	line_join_points	
	0.5%	0.5%	mach_kernel	lo_alltraps	
	0.5%	0.5%	gs	cmap_cmyk_direct	
	0.5%	0.5%	gs	i_alloc_struct	


with alpha

	63.9%	63.9%	gs	intersect_al	
	26.3%	26.3%	gs	spot_into_trapezoids__aj_fd	
	4.8%	4.8%	gs	x_order	
	0.8%	0.8%	gs	fixed_mult_quo	
	0.6%	0.6%	gs	mem_true24_copy_alpha	
	0.5%	0.5%	gs	insert_x_new	
	0.5%	0.5%	gs	bits_compress_scaled	
	0.2%	0.2%	gs	gs_interpret	
	0.1%	0.1%	gs	dstack_find_name_by_index	
	0.1%	0.1%	gs	fill_slant_adjust	
	0.1%	0.1%	gs	clip_fill_rectangle	
	0.1%	0.1%	gs	jpeg_idct_islow	
	0.1%	0.1%	gs	gx_path_merge_contacting_contours	
	0.1%	0.1%	mach_kernel	ml_set_interrupts_enabled	
	0.1%	0.1%	gs	dyld_stub___divdi3	
	0.1%	0.1%	gs	mem_mono_fill_rectangle	
	0.1%	0.1%	gs	inflate_fast	
	0.1%	0.1%	gs	image_render_color	
	0.1%	0.1%	mach_kernel	blkclr	
	0.0%	0.0%	gs	ycck_cmyk_convert	
	0.0%	0.0%	gs	names_ref	
	0.0%	0.0%	gs	clip_enumerate_rest	
	0.0%	0.0%	gs	scan_token	
	0.0%	0.0%	gs	insert_y_line	
	0.0%	0.0%	gs	y_transfer_next	
	0.0%	0.0%	gs	gx_dc_pure_fill_rectangle	
	0.0%	0.0%	gs	mem_abuf_fill_rectangle	
	0.0%	0.0%	gs	gx_general_fill_path	
	0.0%	0.0%	mach_kernel	lo_alltraps	
	0.0%	0.0%	gs	bits_fill_rectangle	
	0.0%	0.0%	gs	t1_hinter__endglyph	



Comment 4 Masaki Ushizaka 2009-08-13 23:07:27 UTC
Tried with r9980 on Mac OS X using command line in original report.
Without -dGraphicsAlphaBits=2 it finished in 8.5sec.
With -dGraphicsAlphaBits=2, it didn't finish after 3 hours 15 minutes and I gave up.
Comment 5 Michael Vrhel 2009-09-10 10:38:23 UTC
Created attachment 5367 [details]
bug_690531_simple.pdf

This is simplified version of the file.  Here we are only drawing the number 5
from the phone number that was at the lower right.  Looking at how AR draws the
5 at high resolution, and by looking at the contents of the simple file you can
see that there are a lot of overlapping path fills that occur.
Comment 6 Michael Vrhel 2009-09-14 10:35:06 UTC
Reassign to Henry for review.
Comment 7 Henry Stiles 2009-09-18 00:09:21 UTC
Well the overlapping fills are not really a problem.  The basic issue is the
business of taking an already complex path and converting it to a fill which
must be done as a complete unit.  The path in slow.pdf which is very slow (the
first couple are not slow) is a stroke:

(gdb) p gx_path_print(pgs->path)
   state_flags=1 subpaths=20, curves=190, point=(3727.320312,6237.218750)
   box=(3570.265625,5941.726562),(5807.179688,6303.710938) last=0x17f2630
   segments=0x17b367c (refct=1, first=0x17ef054, current=0x17ef84c)
   0x17ef054<0x0,0x17f0290>:0: 5735.8828 6037.1250 moveto	% #curves=8 last=0x17ef098
   0x17f0290<0x17ef054,0x17f02c0>:0: 5735.8828 6018.6406 5735.1484 6007.0859
5733.6797 6002.4297 curveto
   0x17f02c0<0x17f0290,0x17f02f0>:0: 5732.2031 5997.7734 5728.7422 5995.4531
5723.3203 5995.4531 curveto
   0x17f02f0<0x17f02c0,0x17f0320>:0: 5717.9922 5995.4531 5714.4766 5997.9219
5712.7656 6002.8594 curveto
   0x17f0320<0x17f02f0,0x17b369c>:0: 5711.0547 6007.7969 5710.2031 6019.2188
5710.2031 6037.1250 curveto

...

This is quite reasonable 20 subpaths, none very long.  But then we convert it to
a fill after flattening:

	    /*
	     * The alpha-buffer device requires that we fill the
	     * entire path as a single unit.
	     */
	    gx_path_init_local(&spath, pgs->memory);
	    code = gx_stroke_add(pgs->path, &spath, pgs);
	    gs_setlinewidth(pgs, orig_width);
	    scale_dash_pattern(pgs, 1.0 / scale);
	    if (code >= 0)
		code = gx_fill_path(&spath, pgs->dev_color, pgs,
				    gx_rule_winding_number,
				    pgs->fill_adjust.x,
				    pgs->fill_adjust.y);


Our fill derived from the path is now a whopping 5120 subpaths.  As implied by
the comment in the code alpha code requires this is to be filled as a single
unit which is beyond reach for the fill code.  I imagine fixing this will take a
quite a bit of work, it would be possible to fallback (no AA) if a path is too
complex.

The problem should be reproducible by drawing the path in postscript, using
strokepath then fill.
Comment 8 Henry Stiles 2009-09-24 13:29:56 UTC
*** Bug 688363 has been marked as a duplicate of this bug. ***
Comment 9 Robin Watts 2009-10-28 08:19:24 UTC
Created attachment 5562 [details]
slowpath.ps

This is a postscript file showing the first slow path lifted from
bug_690531_simple.pdf. This is stroked to give slowpath2.ps.
Comment 10 Robin Watts 2009-10-28 08:21:05 UTC
Created attachment 5563 [details]
slowpath2.ps

This is the path that is the result of stroking slowpath.ps. Render this in gs
and it's clear why filling it is so painful!
Comment 11 Robin Watts 2009-10-28 10:45:27 UTC
Created attachment 5564 [details]
bug_690531_simple2.pdf

Further simplified version; this contains just one (very complex) path, clipped
through another.
Comment 12 Robin Watts 2009-10-28 10:46:54 UTC
Created attachment 5565 [details]
bug_690531_simple3.pdf

Even simpler version that just shows the one (complex) path.

When rendered in gs (with -dGraphicsAlphaBits=2) we can see the AA. When
rendered in acrobat, no AA seen.
Comment 13 Robin Watts 2009-10-28 10:59:05 UTC
oops. You *can* see the AA in acrobat, if you remember to enable it. Sorry.
Comment 14 Henry Stiles 2009-11-05 09:25:37 UTC
Created attachment 5623 [details]
slowpath_dump.ps

The file slowpath.ps, strokepath and a procedure to dump the constructed path
to the adobe console.
Comment 15 Robin Watts 2009-11-17 07:40:14 UTC
Created attachment 5680 [details]
patch

This patch amends ghostscripts stroking code to add a new mode to it.

When asked to stroke a path and return it (rather than filling it as we go)
without being in CPSI_mode, this patch causes the stroking code to construct 2
paths; one around the start cap, along the bottom of the curve and around the
end cap, the other along the top of the curve. These curves are then combined
together to give a single enclosing path.

This path generally onlines the same area as the original path did, but with
many fewer cross stroke line segments, resulting in the fill completing much
faster.

This is at best a work in progress, as a) it doesn't do mitreing currently, and
b) it gets the under sides of joins wrong in some cases (see bug 688655).

It is a proof of concept though, and does show a significant speedup.
Comment 16 Robin Watts 2009-11-18 09:23:31 UTC
Created attachment 5683 [details]
patch2

Improved Patch. This completes the alternative stroking code.

It has 2 flaws that make it less than perfect.

1) The 'under join' has been tweaked to only be an approximation in the case
where we join under curves. Without this we still get too many 'cross segment'
lines and it renders incredibly slowly. This still leaves us better than muPDF
(which makes this approximation in all cases).

2) One of the changes made as part of this patch (the flag being passed into
the stroke_line_proc's saying whether we are joining linesegments that came
from a curve) will allow us to fix some of the flatness issues in bug 688655.
Doing so makes the resulting stroked path far too slow again (as it involves
lots of curves). This has therefore been omitted.

Also, there is scope for polishing the code a bit more - perhaps abandoning
adding arrays of points and doing it directly.
Comment 17 Robin Watts 2009-11-19 04:43:05 UTC
Retesting with the original file, it now takes about 1 minute to render at 
300dpi. Slower than we'd like, but much more acceptable.
Comment 18 Robin Watts 2009-11-19 05:56:57 UTC
Slightly revised version of patch2 committed as revision 10351. Closing bug.
Comment 19 Marcos H. Woehrmann 2011-09-18 21:48:00 UTC
Changing customer bugs that have been resolved more than a year ago to closed.