The customer reports and I've verified that the attached file renders >60x slower when adding -dGraphicsAlphaBits=2 to the command line. This is with head (r9788), but gs8.64 behaves similarly. The command lines I'm using: bin/gs -sDEVICE=tiff24nc -r300 -o test.tif slow.pdf and bin/gs -sDEVICE=tiff24nc -r300 -dGraphicsAlphaBits=2 -o test.tif slow.pdf I've tried adding -dMaxBitmap and -K options to the command line and they do not appear to affect the speed.
Created attachment 5088 [details] slow.pdf
*** Bug 688607 has been marked as a duplicate of this bug. ***
without alpha: 26.3% 26.3% gs clip_enumerate_rest 7.9% 7.9% gs gs_interpret 5.7% 5.7% gs gx_default_fill_trapezoid 5.3% 5.3% gs mem_true24_fill_rectangle 3.7% 3.7% gs gx_general_fill_path 2.8% 2.8% gs spot_into_trapezoids__aj_fd 2.5% 2.5% gs dstack_find_name_by_index 1.8% 1.8% gs intersect_al 1.5% 1.5% gs fixed_mult_quo 1.5% 1.5% gs clip_call_fill_rectangle 1.4% 1.4% gs gx_flattened_iterator__init_line 1.4% 1.4% gs jpeg_idct_islow 1.4% 1.4% gs gx_flattened_iterator__next 1.4% 1.4% gs clip_fill_rectangle 1.2% 1.2% gs inflate_fast 1.1% 1.1% gs i_free_object 1.1% 1.1% gs gx_device_init 1.1% 1.1% gs scan_token 1.0% 1.0% gs step_al 1.0% 1.0% gs ycck_cmyk_convert 1.0% 1.0% gs gx_stroke_path_only 0.9% 0.9% gs gx_path_add_lines_notes 0.8% 0.8% gs array_get 0.7% 0.7% gs image_render_color 0.6% 0.6% mach_kernel ml_set_interrupts_enabled 0.6% 0.6% gs i_stable 0.6% 0.6% gs line_join_points 0.5% 0.5% mach_kernel lo_alltraps 0.5% 0.5% gs cmap_cmyk_direct 0.5% 0.5% gs i_alloc_struct with alpha 63.9% 63.9% gs intersect_al 26.3% 26.3% gs spot_into_trapezoids__aj_fd 4.8% 4.8% gs x_order 0.8% 0.8% gs fixed_mult_quo 0.6% 0.6% gs mem_true24_copy_alpha 0.5% 0.5% gs insert_x_new 0.5% 0.5% gs bits_compress_scaled 0.2% 0.2% gs gs_interpret 0.1% 0.1% gs dstack_find_name_by_index 0.1% 0.1% gs fill_slant_adjust 0.1% 0.1% gs clip_fill_rectangle 0.1% 0.1% gs jpeg_idct_islow 0.1% 0.1% gs gx_path_merge_contacting_contours 0.1% 0.1% mach_kernel ml_set_interrupts_enabled 0.1% 0.1% gs dyld_stub___divdi3 0.1% 0.1% gs mem_mono_fill_rectangle 0.1% 0.1% gs inflate_fast 0.1% 0.1% gs image_render_color 0.1% 0.1% mach_kernel blkclr 0.0% 0.0% gs ycck_cmyk_convert 0.0% 0.0% gs names_ref 0.0% 0.0% gs clip_enumerate_rest 0.0% 0.0% gs scan_token 0.0% 0.0% gs insert_y_line 0.0% 0.0% gs y_transfer_next 0.0% 0.0% gs gx_dc_pure_fill_rectangle 0.0% 0.0% gs mem_abuf_fill_rectangle 0.0% 0.0% gs gx_general_fill_path 0.0% 0.0% mach_kernel lo_alltraps 0.0% 0.0% gs bits_fill_rectangle 0.0% 0.0% gs t1_hinter__endglyph
Tried with r9980 on Mac OS X using command line in original report. Without -dGraphicsAlphaBits=2 it finished in 8.5sec. With -dGraphicsAlphaBits=2, it didn't finish after 3 hours 15 minutes and I gave up.
Created attachment 5367 [details] bug_690531_simple.pdf This is simplified version of the file. Here we are only drawing the number 5 from the phone number that was at the lower right. Looking at how AR draws the 5 at high resolution, and by looking at the contents of the simple file you can see that there are a lot of overlapping path fills that occur.
Reassign to Henry for review.
Well the overlapping fills are not really a problem. The basic issue is the business of taking an already complex path and converting it to a fill which must be done as a complete unit. The path in slow.pdf which is very slow (the first couple are not slow) is a stroke: (gdb) p gx_path_print(pgs->path) state_flags=1 subpaths=20, curves=190, point=(3727.320312,6237.218750) box=(3570.265625,5941.726562),(5807.179688,6303.710938) last=0x17f2630 segments=0x17b367c (refct=1, first=0x17ef054, current=0x17ef84c) 0x17ef054<0x0,0x17f0290>:0: 5735.8828 6037.1250 moveto % #curves=8 last=0x17ef098 0x17f0290<0x17ef054,0x17f02c0>:0: 5735.8828 6018.6406 5735.1484 6007.0859 5733.6797 6002.4297 curveto 0x17f02c0<0x17f0290,0x17f02f0>:0: 5732.2031 5997.7734 5728.7422 5995.4531 5723.3203 5995.4531 curveto 0x17f02f0<0x17f02c0,0x17f0320>:0: 5717.9922 5995.4531 5714.4766 5997.9219 5712.7656 6002.8594 curveto 0x17f0320<0x17f02f0,0x17b369c>:0: 5711.0547 6007.7969 5710.2031 6019.2188 5710.2031 6037.1250 curveto ... This is quite reasonable 20 subpaths, none very long. But then we convert it to a fill after flattening: /* * The alpha-buffer device requires that we fill the * entire path as a single unit. */ gx_path_init_local(&spath, pgs->memory); code = gx_stroke_add(pgs->path, &spath, pgs); gs_setlinewidth(pgs, orig_width); scale_dash_pattern(pgs, 1.0 / scale); if (code >= 0) code = gx_fill_path(&spath, pgs->dev_color, pgs, gx_rule_winding_number, pgs->fill_adjust.x, pgs->fill_adjust.y); Our fill derived from the path is now a whopping 5120 subpaths. As implied by the comment in the code alpha code requires this is to be filled as a single unit which is beyond reach for the fill code. I imagine fixing this will take a quite a bit of work, it would be possible to fallback (no AA) if a path is too complex. The problem should be reproducible by drawing the path in postscript, using strokepath then fill.
*** Bug 688363 has been marked as a duplicate of this bug. ***
Created attachment 5562 [details] slowpath.ps This is a postscript file showing the first slow path lifted from bug_690531_simple.pdf. This is stroked to give slowpath2.ps.
Created attachment 5563 [details] slowpath2.ps This is the path that is the result of stroking slowpath.ps. Render this in gs and it's clear why filling it is so painful!
Created attachment 5564 [details] bug_690531_simple2.pdf Further simplified version; this contains just one (very complex) path, clipped through another.
Created attachment 5565 [details] bug_690531_simple3.pdf Even simpler version that just shows the one (complex) path. When rendered in gs (with -dGraphicsAlphaBits=2) we can see the AA. When rendered in acrobat, no AA seen.
oops. You *can* see the AA in acrobat, if you remember to enable it. Sorry.
Created attachment 5623 [details] slowpath_dump.ps The file slowpath.ps, strokepath and a procedure to dump the constructed path to the adobe console.
Created attachment 5680 [details] patch This patch amends ghostscripts stroking code to add a new mode to it. When asked to stroke a path and return it (rather than filling it as we go) without being in CPSI_mode, this patch causes the stroking code to construct 2 paths; one around the start cap, along the bottom of the curve and around the end cap, the other along the top of the curve. These curves are then combined together to give a single enclosing path. This path generally onlines the same area as the original path did, but with many fewer cross stroke line segments, resulting in the fill completing much faster. This is at best a work in progress, as a) it doesn't do mitreing currently, and b) it gets the under sides of joins wrong in some cases (see bug 688655). It is a proof of concept though, and does show a significant speedup.
Created attachment 5683 [details] patch2 Improved Patch. This completes the alternative stroking code. It has 2 flaws that make it less than perfect. 1) The 'under join' has been tweaked to only be an approximation in the case where we join under curves. Without this we still get too many 'cross segment' lines and it renders incredibly slowly. This still leaves us better than muPDF (which makes this approximation in all cases). 2) One of the changes made as part of this patch (the flag being passed into the stroke_line_proc's saying whether we are joining linesegments that came from a curve) will allow us to fix some of the flatness issues in bug 688655. Doing so makes the resulting stroked path far too slow again (as it involves lots of curves). This has therefore been omitted. Also, there is scope for polishing the code a bit more - perhaps abandoning adding arrays of points and doing it directly.
Retesting with the original file, it now takes about 1 minute to render at 300dpi. Slower than we'd like, but much more acceptable.
Slightly revised version of patch2 committed as revision 10351. Closing bug.
Changing customer bugs that have been resolved more than a year ago to closed.