Bug 689003 - Performance problem during conversion PDF->PS with a particular PDF
Summary: Performance problem during conversion PDF->PS with a particular PDF
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Color (show other bugs)
Version: 8.54
Hardware: PC Windows XP
: P4 normal
Assignee: Michael Vrhel
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-11-17 08:56 UTC by Francois Donnay
Modified: 2016-10-11 09:53 UTC (History)
2 users (show)

See Also:
Customer:
Word Size: ---


Attachments
Test File (test.pdf) (2.87 MB, application/pdf)
2006-11-17 08:58 UTC, Francois Donnay
Details
profile.png (36.54 KB, image/png)
2016-10-11 09:47 UTC, Michael Vrhel
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Francois Donnay 2006-11-17 08:56:41 UTC
Converting the PDF test.pdf takes more than 15 min on my PC.

I have used this command line:
gswin32c -sOutputFile="test.ps" -sDEVICE=pswrite  -dNOPAUSE -dBATCH  test.pdf


Thanks for your Help,
Francois.
Comment 1 Francois Donnay 2006-11-17 08:58:28 UTC
Created attachment 2619 [details]
Test File (test.pdf)
Comment 2 Marcos H. Woehrmann 2006-11-17 10:56:40 UTC
The attached PDF file takes a long time to process with gs854 and gshead on my Linux amd64 system (11 
minutes to convert all 10 pages to a 72DPI PPM).  Loading it using Apple Preview is much faster (a few 
seconds per page on my much slower PowerBook G4).
Comment 3 Ralph Giles 2006-11-22 10:21:08 UTC
Seems to be drawing two blue rectangles by drawing the same small element
hundreds of times with a different cm. Running it with -dPDFDEBUG shows many,
many content elements like:

q
/CS3 cs
%Resolving: [369 0]
0.69 0.773 0.863 sc
0 0.96 0.96 0 62.94 219.5905 cm
/Im1633 Do
%Resolving: [1996 0]
Q

So the file is doing something expensive, but it's not clear exactly why we're
so much slower than Quartz. Probably the PDF interpreter is taking a long time
doing the repeated object resolutions, or perhaps the coupling to the graphics
library is slow. Perhaps other implementations have an object cache?
Comment 4 Ralph Giles 2006-11-22 10:30:58 UTC
Further analysis: object 369 is a CalRGB colorspace. Page three doesn't set the
colorspace for each instantiation of the element and is considerably faster, so
setting and then tearing down the colorspace is a significant component of the
slowdown.

The image being drawn is a 4x4 1-bit CCITTFaxDecode image.
Comment 5 Henry Stiles 2007-12-29 10:17:37 UTC
From profiling the hotspot is in gs_interpret()... assigning to Alex.
Comment 6 Henry Stiles 2007-12-31 17:31:35 UTC
#5 is wrong and #4 is a better explanation.  The pdf interpreter seems to be
repeatedly setting the same color space.  Each setting requires expensive
rebuilding of the joint cie caches.  Alex, can the pdf interpreter detect it is
installing the same color space repeatedly?
Comment 7 Ray Johnston 2008-01-01 15:34:45 UTC
The 'q' ... 'Q' performs a gsave / grestore around the setting of the CalRGB
colorspace, so detecting this case is problematic. It would require setting the
CalRGB space "before" the gsave so that the grestore doesn't switch back.

A better solution might be to 'cache' colorspaces so that the grestore switches
to the gsave'd colorspace but the CalRGB structure is kept around in the cache
to allow it to be used again quickly.

In any case, not a real simple thing to do.
Comment 8 Henry Stiles 2008-01-01 16:24:45 UTC
Thanks Ray, I didn't see the gsave and grestore.  I guess the other rips have
found a way to optimize this or, more likely, they don't have the overhead of
setting up the caches, so colorspace installation is cheap.
Comment 9 Alex Cherepanov 2008-01-02 06:41:13 UTC
I agree with Ray's analysis and would like to add that the most time
consuming part is repeated interpretation of various PostScript functions
in the color space and CRD.

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 22.73     14.22    14.22      572     0.02     0.11  interp
 10.40     20.72     6.51 214976243     0.00     0.00  for_samples_continue
  7.79     25.59     4.87   419049     0.00     0.00  cie_cache_set_interpolation
  6.91     29.92     4.33 71517696     0.00     0.00  ztpqr_scale_wb_common
  4.95     33.01     3.10 357683943     0.00     0.00  real_param
  3.33     35.10     2.09 290930640     0.00     0.00  array_get
  2.61     36.73     1.64 71519226     0.00     0.00  zexp
  2.25     38.14     1.41  7561466     0.00     0.00  dstack_find_name_by_index
  2.24     39.54     1.40    46561     0.00     0.00  gx_cie_load_common_cache
  2.20     40.92     1.38 71517696     0.00     0.00  cie_exec_tpqr
  2.01     42.18     1.26    46561     0.00     0.00  gx_install_cie_abc
  1.37     43.04     0.86 71760935     0.00     0.00  ref_stack_counttomark
  1.37     43.89     0.86 77018145     0.00     0.00  zexec
  1.33     44.72     0.83   419049     0.00     0.00  cie_cache_mult
  1.26     45.51     0.79   116682     0.00     0.00  gc_trace
  1.26     46.30     0.79 71619101     0.00     0.00  num_params
  1.07     46.97     0.67     2129     0.00     0.00  image_init_clues
  1.06     47.63     0.66  1086009     0.00     0.00  float_params
  1.02     48.27     0.64 71517696     0.00     0.00  cie_post_exec_tpqr
  1.01     48.90     0.63  1827244     0.00     0.00  names_ref
  0.78     49.39     0.49   419059     0.00     0.00  zfor_samples
  0.75     49.86     0.47    48686     0.00     0.00  gx_image_enum_begin
  0.72     50.31     0.45 71772232     0.00     0.00  ref_stack_enum_begin
  0.70     50.74     0.44 72204335     0.00     0.00  ref_stack_pop
  0.68     51.17     0.43   419046     0.00     0.00  cache_is_linear
  0.60     51.54     0.38  4021784     0.00     0.00  igc_reloc_refs
  0.58     51.90     0.36  3308519     0.00     0.00  refs_set_reloc
  0.54     52.24     0.34  1131740     0.00     0.00  chunk_locate_ptr
  0.51     52.56     0.32  6768774     0.00     0.00  dict_find
  0.42     52.82     0.26  3187995     0.00     0.00  check_l_mark
  0.39     53.07     0.25    46561     0.00     0.00  cie_cache_joint
  0.38     53.30     0.24  3414376     0.00     0.00  refs_clear_marks
  0.34     53.52     0.22  2009029     0.00     0.00  refs_compact
  0.34     53.73     0.22 71517696     0.00     0.00  abc_identity
  0.33     53.94     0.21  5020992     0.00     0.00  igc_reloc_ref_ptr_nocheck
  0.30     54.13     0.19 77018145     0.00     0.00  check_for_exec
  0.30     54.31     0.19   924621     0.00     0.00  scan_token
Comment 10 Henry Stiles 2008-01-07 12:25:19 UTC
Is the performance difference between gs and Adobe and XPDF a result of building
the joint cie caches?  Building the cache results in many calls to these
functions which are not necessary to print the job.  Adobe and XPDF may not have
a special optimization for detecting the same color space.  The performance
difference may be a side effect of our cache optimization.  I guess in either
case this is not something we will fix in the near future, but it would be nice
to understand the details, since our relative performance is so poor for this
job.  Reassigning to Ralph.  Thanks Alex and Ray for looking at this.
Comment 11 Hin-Tak Leung 2010-05-05 00:10:48 UTC
From the discussion it looks like performance issue is in colorspace and joint cie caches. That sounds like Michael's area.
Comment 12 Shailesh Mistry 2011-07-19 19:40:51 UTC
Bug still reproducible in Ghostscript 9.03
Comment 13 Michael Vrhel 2016-10-11 09:47:01 UTC
Created attachment 13011 [details]
profile.png

Profile results with 9.20
Comment 14 Michael Vrhel 2016-10-11 09:53:20 UTC
Closing this.  Running 9.20 debug on ps2write with a profiler only takes 2 minutes (pswrite is deprecated).  Profile results indicate that image_file_continue and its children are the biggest consumer (see attached png).  Release build takes 52 seconds to do all 10 pages.