Bug 689859 - Performance comparison versions 7.07 and 8.62 on HP-UX ia64
Summary: Performance comparison versions 7.07 and 8.62 on HP-UX ia64
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: General (show other bugs)
Version: 8.62
Hardware: HP HP-UX
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-05-21 22:10 UTC by James Dean
Modified: 2010-08-10 09:34 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
test.1000.ps.gz (65.54 KB, application/octet-stream)
2008-05-21 22:12 UTC, James Dean
Details
test.10000.ps.gz (623.83 KB, application/octet-stream)
2008-05-21 22:13 UTC, James Dean
Details
test.20000.ps.gz (1.22 MB, application/octet-stream)
2008-05-21 22:13 UTC, James Dean
Details
mem7.07vs8.62.jpg (29.20 KB, image/jpeg)
2008-05-21 22:15 UTC, James Dean
Details
performance.png (7.71 KB, image/png)
2008-05-28 07:16 UTC, Marcos H. Woehrmann
Details
detail1.png (91.71 KB, image/png)
2008-05-28 13:39 UTC, Marcos H. Woehrmann
Details
detail2.png (105.07 KB, image/png)
2008-05-28 17:37 UTC, Marcos H. Woehrmann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description James Dean 2008-05-21 22:10:05 UTC
This was originally raised on the gs-devel forum.
http://ghostscript.com/pipermail/gs-devel/2008-May/007778.html 

I have been using GhostScript to convert PostScript to PDF on an HP-UX ia64
machine.  

The following table shows the results on an 900MHz Itanium machine.
For 7.07, the performance is not linear as documents get larger.  (For 50,000
pages approx 4 pages/sec)
For files up to 5,000 pages, 7.07 is more than 3 times faster than 8.62. (approx
11:3)

doc pages      7.07     7.07        8.62     8.62
             elapsed  pages/sec   elapsed  pages/sec
    1,000    00:09.2    111       00:34.2     29
   10,000    01:58.1     85       05:38.4     30
   20,000    09:22.8     36       11:26.9     29

The test PostScript files are attached.
A graph showing the memory usage reported by top when converting the 20,000 page
file is also attached.



To get 8.62 to compile, I needed to make the following changes.  I'm hoping
these are not the cause.

aCC  -DHAVE_MKSTEMP -DHAVE_HYPOT -DHAVE_FILE64   -O -DHAVE_STDINT_H
-DGX_COLOR_INDEX_TYPE="unsigned long long" -Ae -fast  -O -DHAVE_STDINT_H
-DGX_COLOR_INDEX_TYPE="unsigned long long" -Ae -fast  -I./obj -Ijbig2dec  -o
./obj/jbig2_segment.o -c jbig2dec/jbig2_segment.c
Error 419: "jbig2dec/jbig2.h", line 58 # 'uint8_t' is used as a type, but has
not been defined as a type.
            uint8_t        *data;
            ^^^^^^^
Error 419: "jbig2dec/jbig2.h", line 77 # 'int32_t' is used as a type, but has
not been defined as a type.
                                      int32_t seg_idx);
                                      ^^^^^^^
...

To temporarily fix, added following to jbig2_segment.c after #include "os_types.h"

typedef unsigned int uint32_t;
typedef unsigned short uint16_t;
typedef unsigned char uint8_t;
typedef signed int int32_t;
typedef signed short int16_t;
typedef signed char int8_t;


aCC  -DHAVE_MKSTEMP -DHAVE_HYPOT -DHAVE_FILE64   -O -DHAVE_STDINT_H
-DGX_COLOR_INDEX_TYPE="unsigned long long" -Ae -fast  -I./obj/
-Ijasper/src/libjasper/include -DJAS_CONFIGURE -DEXCLUDE_BMP_SUPPORT=1
-DEXCLUDE_JPG_SUPPORT=1 -DEXCLUDE_MIF_SUPPORT=1 -DEXCLUDE_PGX_SUPPORT=1
-DEXCLUDE_PNM_SUPPORT=1 -DEXCLUDE_RAS_SUPPORT=1 -DEXCLUDE_PNG_SUPPORT=1 -o
./obj/jpc_enc.o -c jasper/src/libjasper/jpc/jpc_enc.c
Error 172: "jasper/src/libjasper/jpc/jpc_enc.c", line 387 # Undeclared variable
'UINT_FAST32_MAX'.
        cp->imgareatlx = UINT_FAST32_MAX;
                         ^^^^^^^^^^^^^^^
Error 172: "jasper/src/libjasper/jpc/jpc_enc.c", line 388 # Undeclared variable
'UINT_FAST32_MAX'.
        cp->imgareatly = UINT_FAST32_MAX;
                         ^^^^^^^^^^^^^^^
...


To temporarily fix, added following to jpc_enc.c after #include "jpc_util.h"

#ifndef UINT_FAST32_MAX
#include <stdint.h>
#define UINT_FAST32_MAX   UINT_MAX
#endif
Comment 1 James Dean 2008-05-21 22:12:14 UTC
Created attachment 4038 [details]
test.1000.ps.gz
Comment 2 James Dean 2008-05-21 22:13:10 UTC
Created attachment 4039 [details]
test.10000.ps.gz
Comment 3 James Dean 2008-05-21 22:13:54 UTC
Created attachment 4040 [details]
test.20000.ps.gz
Comment 4 James Dean 2008-05-21 22:15:50 UTC
Created attachment 4041 [details]
mem7.07vs8.62.jpg
Comment 5 Alex Cherepanov 2008-05-22 19:43:36 UTC
I confirm the the reported results on a x86 box. 
About 5% performance gain can be achieved by saving and restoring VM state 
for every page. The following definitions executed before running the job do
the trick.
  /bop-hook { save exch } bind def
  /eop-hook { restore } bind def

Profiling the HEAD shows that most of the time is spent in comparing T1 fonts.
The same fonts get compared to each other again at again. Caching the result
should greatly improve the performance.

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
  9.86     30.26    30.26 2040932381     0.00     0.00  z1_subr_data
  8.46     56.20    25.94 16720738     0.00     0.00  same_type1_subrs
  5.38     72.69    16.49 2257618770     0.00     0.00  array_get
  5.28     88.90    16.21 2040930660     0.00     0.00  copied_type1_subr_data
  4.56    102.90    14.00 128223482     0.00     0.00  names_ref
  4.22    115.86    12.96 4207871013     0.00     0.00  gs_glyph_data_from_string
  3.77    127.42    11.56      564     0.02     0.53  interp
  2.85    136.15     8.73  8420407     0.00     0.00  gstate_clone
  2.38    143.45     7.31 53145747     0.00     0.00  gs_type1_interpret
  2.17    150.12     6.67  8360372     0.00     0.00  process_text_estimate_bbox
  2.07    156.46     6.34 53145747     0.00     0.00  t1_hinter__init
  1.89    162.27     5.81 88931108     0.00     0.00  dstack_find_name_by_index
  1.85    167.96     5.69 221061877     0.00     0.00  gs_point_transform
  1.83    173.58     5.62      290     0.02     0.02  copied_type1_glyph_data
  1.45    178.04     4.46 175475968     0.00     0.00  dict_find
  1.41    182.38     4.34 36559937     0.00     0.00  scan_token
  1.21    186.08     3.70 76424130     0.00     0.00  i_free_object
  1.16    189.65     3.57 212582109     0.00     0.00  zfont_encode_char
  1.07    192.94     3.29  8420403     0.00     0.00  gs_grestore_only
  0.96    195.88     2.94                             glyph_data_substring_by_font
  0.85    198.50     2.62   260018     0.00     0.00  deflate_slow
  0.80    200.95     2.45 53145747     0.00     0.00  gs_matrix_fixed_from_matrix
  0.79    203.38     2.43  8360369     0.00     0.00  same_type1_hinting
  0.78    205.78     2.40 53145330     0.00     0.00  pdf_char_widths_to_uts
  0.78    208.18     2.40 53145330     0.00     0.00  pdf_char_widths
  0.75    210.48     2.31 53150691     0.00     0.00  z1_glyph_data
  0.74    212.77     2.29 53149041     0.00     0.00  gs_type1_glyph_info
  0.71    214.95     2.19 106613560     0.00     0.00  dict_find_string
  0.67    217.01     2.06 16760773     0.00     0.00  gs_moveto_aux
  0.63    218.94     1.93  8360372     0.00     0.00  pdf_make_text_glyphs_table
  0.63    220.86     1.92 60043929     0.00     0.00  i_alloc_struct
  0.56    222.59     1.73  8360371     0.00     0.00  process_text_return_width
  0.49    224.10     1.51 16840810     0.00     0.00  gs_istate_copy
  0.49    225.61     1.51  8360372     0.00     0.00  op_show_find_index
  0.49    227.12     1.51 53148839     0.00     0.00  z1_glyph_info_generic
  0.46    228.52     1.40 123011418     0.00     0.00  pdf_next_char_glyph
  0.45    229.91     1.39 53145337     0.00     0.00  store_glyphs
  0.42    231.19     1.28  9440411     0.00     0.00  set_text_distance
  0.39    232.40     1.21 25678328     0.00     0.00  chunk_locate_ptr
Comment 6 Marcos H. Woehrmann 2008-05-28 07:16:50 UTC
Created attachment 4059 [details]
performance.png

The attachment is a plot of Ghostscript performance from r2500 through r5000.
Comment 7 Marcos H. Woehrmann 2008-05-28 08:57:27 UTC
I'm currently gathering the data for plots with more detail in the ranges from
3325 to 3375 and from 4200 to 4300.
Comment 8 Marcos H. Woehrmann 2008-05-28 13:39:21 UTC
Created attachment 4060 [details]
detail1.png

The detail1.png attachment shows the pages per second time for r3325 through
r3375.	The interesting revisions are r3336 and and r3356:

------------------------------------------------------------------------
r3336 | igor | 2002-11-18 06:29:54 -0800 (Mon, 18 Nov 2002) | 3 lines
Changed paths:
   M /trunk/gs/src/gdevpdfd.c
   M /trunk/gs/src/gdevpdte.c

Fix (pdfwrite) : Skipping text outside clip path.
SourceForge bug "[ 608520 ] PDFWrite failed for PS File".

------------------------------------------------------------------------

------------------------------------------------------------------------
r3356 | igor | 2002-11-20 10:57:10 -0800 (Wed, 20 Nov 2002) | 7 lines
Changed paths:
   M /trunk/gs/src/gsalloc.c
   M /trunk/gs/src/gsalloc.h
   M /trunk/gs/src/zvmem2.c

Fix : (1) gc_status.requested was not initialized for stable memories,
but used by alloc_acquire_chunk. (2) setting gc_status.vm_threshold and
gc_status.enabled for stable memories synchronously with non-stable ones.
This allows to control stable memories with setvmthreshold and vmreclaim.
With default settings pdfwrite now works faster.
SourceForge bug "[ 621243 ] (pdfwrite) slow mahle.pdf"

------------------------------------------------------------------------
Comment 9 Marcos H. Woehrmann 2008-05-28 17:37:12 UTC
Created attachment 4061 [details]
detail2.png

The detail2.png attachment shows the decline in performance for the 10k and 20k
page conversion.  The relevant revision is r4271:

------------------------------------------------------------------------
r4271 | igor | 2003-10-14 13:37:15 -0700 (Tue, 14 Oct 2003) | 12 lines

Fix (pdfwrite) : Resources were prematurely freed, causing duplicated objects
in the output.

DETAILS :

Bug 687078 "pdfwrite duplicates page resources".
pdf_store_page_resources releases page resources, but they may be used in next
pages. Due to that such resources may be duplicated in the output file.

EXPECTED DIFFERENCES :

None.

------------------------------------------------------------------------
Comment 10 Ken Sharp 2010-08-06 14:50:23 UTC
What seems to be happening is that pdfwrite is making a copy of the font(s) in the PostScript file, in order to create a subset (possibly other reasons too).


In order to do this it copies glyphs from the *current* font into the copy as required. However, before doing so it needs to see if the fonts are the same. Using the font name is not sufficiently unique, its possible to create a font called Helvetica, use it, then create a different font called Helvetica and use that.

So pdfwrite makes some quite extensive checks to see if the current font is the same as the one it started copying earlier. If not it will manufacture a new copy, otherwise it simply copies the glyphs if required.

The only way I can think of to cache as Alex suggested in comment #5 would be to somehow hash the various tables whenever we first access a font, and then compare hashes. Quite a big job. We can't even check to see if the glyphs already exist, in case the font changed. 

Possibly we could use the XUID (but that isn't always unique!), or perhaps there is some other guaranteed unique signature we could store and compare. I'll try and check that as it would be much quicker to compare and should give a useful performance improvement on all jobs containing text if it can be done.
Comment 11 Ken Sharp 2010-08-10 09:34:48 UTC
Fixed in revision 11618, patch here:

http://ghostscript.com/pipermail/gs-cvs/2010-August/011614.html

As Alex's analysis shows in comment #5, most of the time is spent in trying to 
decide if two fonts are the same or not so that we can determine if we need to copy a glyph, or already have a copy of the glyph. Unfortunately there is no really reliable way to be sure without checking. What revision 11618 does is check the fonts to see if hashes of the /Subrs exists, if so then these are compared (16 bytes, which is fairly quick), if not then hashes are generated and then compared.

Generating the MD5 hash is about the same speed as the original comparison,
but is only done by pdfwrite, and only done as required, so has no effect on
the regular performance rendering.

For me this improved pdfwrite performance by ~40%, The times I see are:

          9.0 pre-release    9.0 pre-release + patch
#pages     time   pages/s     time   pages/s
1000        16s    62.5         9s    111
10000      167s    59.8        83s    120.5
20000      279s    71.7       167s    119.8

I also checked the Type 42 and CID font equivalents. The CIDFont versions end up calling the type 1 or type 42 code on each descendant font so are not a concern, the type 42 fonts only check the FPGM, CVT and PREP tables which are relatively small and easy to access. It would probably only make a small difference to hash these fonts as well and probably not worth the effort at present.

Obviously this doesn't completely restore the 7.0 performance, but the performance does now remain linear as the page count increases. Also 7.07 is ancient, and I expect the remaining performance loss is probably due to added features and bug fixes (such as the non-linear behaviour with increasing page count), so unlikely to be recoverable.