Summary: | Performance problem during conversion PDF->PS with a particular PDF | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Francois Donnay <donnayf> |
Component: | Color | Assignee: | Michael Vrhel <michael.vrhel> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | christinedelight.top85, shailesh.mistry |
Priority: | P4 | ||
Version: | 8.54 | ||
Hardware: | PC | ||
OS: | Windows XP | ||
Customer: | Word Size: | --- | |
Attachments: |
Test File (test.pdf)
profile.png |
Description
Francois Donnay
2006-11-17 08:56:41 UTC
Created attachment 2619 [details]
Test File (test.pdf)
The attached PDF file takes a long time to process with gs854 and gshead on my Linux amd64 system (11 minutes to convert all 10 pages to a 72DPI PPM). Loading it using Apple Preview is much faster (a few seconds per page on my much slower PowerBook G4). Seems to be drawing two blue rectangles by drawing the same small element hundreds of times with a different cm. Running it with -dPDFDEBUG shows many, many content elements like: q /CS3 cs %Resolving: [369 0] 0.69 0.773 0.863 sc 0 0.96 0.96 0 62.94 219.5905 cm /Im1633 Do %Resolving: [1996 0] Q So the file is doing something expensive, but it's not clear exactly why we're so much slower than Quartz. Probably the PDF interpreter is taking a long time doing the repeated object resolutions, or perhaps the coupling to the graphics library is slow. Perhaps other implementations have an object cache? Further analysis: object 369 is a CalRGB colorspace. Page three doesn't set the colorspace for each instantiation of the element and is considerably faster, so setting and then tearing down the colorspace is a significant component of the slowdown. The image being drawn is a 4x4 1-bit CCITTFaxDecode image. From profiling the hotspot is in gs_interpret()... assigning to Alex. #5 is wrong and #4 is a better explanation. The pdf interpreter seems to be repeatedly setting the same color space. Each setting requires expensive rebuilding of the joint cie caches. Alex, can the pdf interpreter detect it is installing the same color space repeatedly? The 'q' ... 'Q' performs a gsave / grestore around the setting of the CalRGB colorspace, so detecting this case is problematic. It would require setting the CalRGB space "before" the gsave so that the grestore doesn't switch back. A better solution might be to 'cache' colorspaces so that the grestore switches to the gsave'd colorspace but the CalRGB structure is kept around in the cache to allow it to be used again quickly. In any case, not a real simple thing to do. Thanks Ray, I didn't see the gsave and grestore. I guess the other rips have found a way to optimize this or, more likely, they don't have the overhead of setting up the caches, so colorspace installation is cheap. I agree with Ray's analysis and would like to add that the most time consuming part is repeated interpretation of various PostScript functions in the color space and CRD. % cumulative self self total time seconds seconds calls s/call s/call name 22.73 14.22 14.22 572 0.02 0.11 interp 10.40 20.72 6.51 214976243 0.00 0.00 for_samples_continue 7.79 25.59 4.87 419049 0.00 0.00 cie_cache_set_interpolation 6.91 29.92 4.33 71517696 0.00 0.00 ztpqr_scale_wb_common 4.95 33.01 3.10 357683943 0.00 0.00 real_param 3.33 35.10 2.09 290930640 0.00 0.00 array_get 2.61 36.73 1.64 71519226 0.00 0.00 zexp 2.25 38.14 1.41 7561466 0.00 0.00 dstack_find_name_by_index 2.24 39.54 1.40 46561 0.00 0.00 gx_cie_load_common_cache 2.20 40.92 1.38 71517696 0.00 0.00 cie_exec_tpqr 2.01 42.18 1.26 46561 0.00 0.00 gx_install_cie_abc 1.37 43.04 0.86 71760935 0.00 0.00 ref_stack_counttomark 1.37 43.89 0.86 77018145 0.00 0.00 zexec 1.33 44.72 0.83 419049 0.00 0.00 cie_cache_mult 1.26 45.51 0.79 116682 0.00 0.00 gc_trace 1.26 46.30 0.79 71619101 0.00 0.00 num_params 1.07 46.97 0.67 2129 0.00 0.00 image_init_clues 1.06 47.63 0.66 1086009 0.00 0.00 float_params 1.02 48.27 0.64 71517696 0.00 0.00 cie_post_exec_tpqr 1.01 48.90 0.63 1827244 0.00 0.00 names_ref 0.78 49.39 0.49 419059 0.00 0.00 zfor_samples 0.75 49.86 0.47 48686 0.00 0.00 gx_image_enum_begin 0.72 50.31 0.45 71772232 0.00 0.00 ref_stack_enum_begin 0.70 50.74 0.44 72204335 0.00 0.00 ref_stack_pop 0.68 51.17 0.43 419046 0.00 0.00 cache_is_linear 0.60 51.54 0.38 4021784 0.00 0.00 igc_reloc_refs 0.58 51.90 0.36 3308519 0.00 0.00 refs_set_reloc 0.54 52.24 0.34 1131740 0.00 0.00 chunk_locate_ptr 0.51 52.56 0.32 6768774 0.00 0.00 dict_find 0.42 52.82 0.26 3187995 0.00 0.00 check_l_mark 0.39 53.07 0.25 46561 0.00 0.00 cie_cache_joint 0.38 53.30 0.24 3414376 0.00 0.00 refs_clear_marks 0.34 53.52 0.22 2009029 0.00 0.00 refs_compact 0.34 53.73 0.22 71517696 0.00 0.00 abc_identity 0.33 53.94 0.21 5020992 0.00 0.00 igc_reloc_ref_ptr_nocheck 0.30 54.13 0.19 77018145 0.00 0.00 check_for_exec 0.30 54.31 0.19 924621 0.00 0.00 scan_token Is the performance difference between gs and Adobe and XPDF a result of building the joint cie caches? Building the cache results in many calls to these functions which are not necessary to print the job. Adobe and XPDF may not have a special optimization for detecting the same color space. The performance difference may be a side effect of our cache optimization. I guess in either case this is not something we will fix in the near future, but it would be nice to understand the details, since our relative performance is so poor for this job. Reassigning to Ralph. Thanks Alex and Ray for looking at this. From the discussion it looks like performance issue is in colorspace and joint cie caches. That sounds like Michael's area. Bug still reproducible in Ghostscript 9.03 Created attachment 13011 [details]
profile.png
Profile results with 9.20
Closing this. Running 9.20 debug on ps2write with a profiler only takes 2 minutes (pswrite is deprecated). Profile results indicate that image_file_continue and its children are the biggest consumer (see attached png). Release build takes 52 seconds to do all 10 pages. |