691038 – provide font rasterization cache for successive gsapi calls

Bug 691038 - provide font rasterization cache for successive gsapi calls

Summary: provide font rasterization cache for successive gsapi calls

Status:	RESOLVED FIXED

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	Client API (show other bugs)
Version:	8.70
Hardware:	PC Linux

Importance:	P4 enhancement
Assignee:	Ray Johnston

URL:
Keywords:

Depends on:
Blocks:

Reported:	2010-01-02 14:31 UTC by Tobias hain
Modified:	2010-04-25 23:11 UTC (History)
CC List:	0 users

See Also:
Customer:
Word Size:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tobias hain 2010-01-02 14:31:21 UTC

Before raising feature requests, I'd first like to explain what I intend to do.
There might be much better ways to accomplish the same thing that I'm simply not
aware of.

I have the requirement to render logos (.eps) which primarily contain text into
(.png/.gif) for display in webapplications at a high throughput. The current
implementation runs on a Quadcore XEON >3GHz processor invoking multiple "gs"
processes concurrently and having a cache of rendered images at hand. However
I'm looking at ways to even improve the performance beyond that.

The EPS files have Postscript Type 1 fonts embedded. I extracted the fonts and
made Ghostscript aware of these by adding those to the Fontmap file resulting in
smaller EPS files to parse.

Extracting the fonts I didn't expect multiple "gs" process invocations to be
faster, but I expected the gsapi to improve in performance because the gsapi
could reuse rastered fonts from Postscript Type 1 outlines. Unfortunately that's
not the case. On my Core2 Duo 2.5Ghz development machine I achieve about 7
eps->png conversion per second in a single threaded environment regardless of
using gs processes, gsapi and extracted or included fonts. Only 10% performance
difference.

Without having done any profiling on Ghostscript I just expect rastering of
Postscript Type 1 outlines in my environment being the most expensive task: EPS
file including PS Type 1 font 220kb, EPS file w/o PS Type font 12kb.

If those rastered fonts could be reused, I expect a dramatic performance boost.
Maybe there are other ways to do that (e.g. use the Ghostscript Library and
render to display buffers, grab the image, clear the page, render next image to
buffer, grab the image, ..). What's the best way to do that?

I noticed there are a couple of more things hampering ultimate performance:
. "The exported gsapi_*() functions must be called from one thread only."
. "At this stage, Ghostscript does not support multiple instances of the
interpreter within a single process."

With different gs processes I can distribute those among the available CPU
cores. However I can't do the with the gsapi: The api can't be called from
different threads and I can't even instantiate as many api instances for each
available core (think of Core i7 with 8 hyperthreading cores). Will the gsapi
support multiple threads or instances in the near future to support concurrency?

Comment 1 Alex Cherepanov 2010-01-02 18:14:10 UTC

Please attach a few EPS files or try to run a profiler on your own.
It is hard to guess what's the bottleneck.
Do you restart Ghostscript for every EPS file you generate ?

Comment 2 Ray Johnston 2010-01-02 20:27:27 UTC

Along with Alex's request to see the EPS files, I have a few other questions...

We've seen that for simple files, using the gsapi interface to process the
input files and generate the output is MUCH more efficient since it skips the
initialization of the interpreter and graphics library. Please see bug 690352
http://bugs.ghostscript.com/show_bug.cgi?id=690352 in particular comment #18
http://bugs.ghostscript.com/show_bug.cgi?id=690352#c18

Thus, I am surprised that using the gsapi call didn't help the throughput.

The other issue is that the gsapi call may not be utilizing the font cache and
the FontDirectory in a way that allows for multiple jobs to avoid re-rendering
glyphs. If a Font is flused from the FontDirectory by an 'end of job' restore,
then the corresponding cache pair won't be used in the next job. Loading fonts
in 'server level' will insure that they are persitent across jobs. Refer to
the documentation on 'exitserver' and the job server loop for this, or let me
know if you need "snapshot" code to load fonts persistently.

Lastly, if the target resolution is known and the font sizes are consistent,
a 'bitmap' (Type 3) font can speed up the font rendering since all of the
hinting and 'fill' operations are pickled into the bitmap font.

I suspect this isn't as much of an enhancement issue as one of "how to use
Ghostscript in the most efficient manner for a particulare application" which
is something I help our customers and users with often.

Comment 3 Tobias hain 2010-01-03 14:53:32 UTC

As I said I might be using ghostscript in a very inefficient batch mode and
basically I'm looking for ways to facilitate the glyph cache.

My benchmarks with these options "-dSAFER -q -dEPSCrop -dNOPAUSE -dBATCH
-sDEVICE=pngalpha -r300" on a P8700 2,53 Core2Duo are these:

EPS with two embedded fonts
gs cmd 1 thread gs : 5,27 fps
gs cmd 2 thread gs : 9,54 fps
gs cmd 3 thread gs : 9,65 fps
gs api 1 thread : 8 fps

same EPS, but with two fonts in Fontmap file
gs cmd 1 thread : 7,18 fps
gs cmd 2 thread : 12,52 fps
gs cmd 3 thread : 12,56 fps
gs api 1 thread : 8 fps

the gsapi calls are called in this order:
gsapi_new_instance()
for all jobs {
  gsapi_set_stdio()
  gsapi_init_with_args()
  gsapi_exit()
}
gsapi_delete_instance()

Obviously this is not sufficient to use the glyph cache and my question boils
down to which way to go:
. use gsapi_set_display_callback() to grap rastered images and try to prevent
new gsapi_init_wih_args() calls?
. use the -dJOBSERVER option? I wasn't aware of this new Ghostscript >= 8.15
feature. Maybe it's exactly what I'm looking for. Is there sample test code? The
usage documentation
http://www.ghostscript.com/doc/current/Use.htm
doesn't tell me whether I can use -sDEVICE=pngalpha in JOBSERVER mode. Will it
wrap each batch into an own PNG envelop? Or do I need to get rastered images and
use my own PNG export filters?

http://bugs.ghostscript.com/show_bug.cgi?id=690352#c18 is interesting. Multiple
gs processes are used to compensate the inability of the gsapi to handle
multiple threads or instances. How are the different jobs delimited?

Comment 4 Alex Cherepanov 2010-01-03 15:22:57 UTC

Did you try to pre-load the fonts into the memory?
Ray is a better person to discuss the configuration options but
many things depend on your EPS files and your fonts.
We cannot even check your results without sample files.
Please attach a few EPS files. You can mark the attachments "private"
to restrict the access to the files to Artifex employees and contractors.

Help us to help you.

Comment 5 Ray Johnston 2010-01-03 23:20:51 UTC

You still haven't uploaded a sample EPS file for use to test with, but generally
you will be able to cache glyphs and also avoid reloading fonts using the API
calls differently.

Use these options:
   "-q -dEPSCrop -dNOPAUSE -sDEVICE=pngalpha -r300 -dJOBSERVER"
but you don't have a "-sOutputFile=___" or "-o ____", but I'll assume that you
want each pngalpha in a separate file.

As mentioned previously, using a single instance of Ghostscript allows data to
be retained in the FontDirectory and font glyph cache for multiple jobs. To do
this, you use slightly different calling sequence:
-------------------------------------------------------------------------
gsapi_new_instance()
gsapi_set_stdio()
gsapi_init_with_args() /* The args are as above */

for all jobs {
  gsapi_run_string(minst,
            "<< /OutputFile (out.png) >> setpagedevice\n" /* set output file */
            ".locksafe\n"                  /* enter SAFER mode */
            "(in.eps) run\n"               /* process the input file */
            "false 0 startjob pop\n",      /* start a new job which also */
                                           /* resets to NOSAFER mode and */
                                           /* allows OutputFile to be changed */
             0, &errcode);
}

gsapi_exit()
gsapi_delete_instance()
-------------------------------------------------------------------------
The -dBATCH is not used since the gsapi_exit() suffices to end the execution.

The -dSAFER is not used since the OutputFile setting must be enabled for
writing prior to locking the file permissions with ".locksafe"

The "false 0 startjob pop" resets the state, including the SAFER mode to
that before the job, allowing the next job to set OutputFile (which is not
allowed in SAFER mode).

As mentioned previously, any fonts that are to be persistent across jobs should
be loaded in the job server VM (exitserver mode), which means that it follows
a "true 0 startjob pop" and the setting of persistent VM is ended by a 
"false 0 startjob pop".

For example to pre-load Helvetica, use:

  gsapi_run_string(minst,
            "true 0 startjob pop\n"        /* exit to the job server */
            "/Helvetica findfont pop\n"    /* load Helvetica */
            "false 0 startjob pop\n",      /* start a new job */
             0, &errcode);

following this, Helvetica (and any other fonts loaded) will be persistent in
VM and won't require any searching.

I expect that most of the performance improvement to come from not restarting
Ghostscript each time, but preloading fonts that are commonly needed (once)
couldn't really hurt since it is done once outside the "for all jobs" loop.

Comment 6 Ray Johnston 2010-01-04 09:37:25 UTC

Note that either .locksafe ot .setsafe can be used in the previous example, but
BOTH allow the input file to start a new job, such as with "false 0 startjob pop"
and continue in NOSAFER mode. To run jobs more securely, one would have to use
the '.runandhide' operator as described in doc/Language.htm#Miscellaneous so
that the save object that restores to NOSAFER mode is inaccessible to the job.

It sort of depends on how much isolation is wanted from malicious PostScript 
input files. Obviously, PDF files can't start new jobs, so this isn't an issue.

Comment 7 Tobias hain 2010-01-04 14:27:04 UTC

Peformance in this use case went up from 8fps -> 16.5fps doing the above suggested:

> same EPS, but with two fonts in Fontmap file
> gs api 1 thread : 8 fps

I would have to modify my code to use "gs cmd" instead of "gsapi" to utilize
multiple cores. I think this should meet my performance requirements. However
just for curiosity:

> If a Font is flused from the FontDirectory by an 'end of job' restore,
> then the corresponding cache pair won't be used in the next job. Refer to
> the documentation on 'exitserver' and the job server loop for this, or let me
> know if you need "snapshot" code to load fonts persistently.

Do I get the 8 -> 16fps performance improvement because of preloaded fonts and
avoiding repeated initialization code? Or because it actually reuses rastered
glyphs between jobs? I would have expected an even higher performance
improvement from a glyph cache to be honest.

Sorry for not yet having uploaded an EPS sample: This will take me some time -
having to replace copyright fonts with free fonts and altering the EPS file
itself to something that can be shared.

Comment 8 Tobias hain 2010-01-05 00:30:32 UTC

robustness vs. performance : certainly a tradeoff. Will implement both
strategies (isolated gs cmds) and -dJOBSERVER. Live evaluation will show whether
-dJOBSERVER is sufficient robust or our EPS files are sufficient well formed.

Thanks again for pointing me to the "3.7.7 Job Execution Environment" in the
Postscript Specification:
http://www.adobe.com/products/postscript/pdfs/PLRM.pdf
I wasn't aware of that feature and I think it makes my feature request obsolete
and may be marked as WONTFIX.

I also see that there is still stuff to implement for the Jobserver and lots of
other things:
http://www.ghostscript.com/doc/current/Projects.htm

Maybe a generel "Performance HowTo" would have helped me a lot. Just pointing to
the appropriate references. I recognize you give in bugzilla similar advises
repeated times.

The bleeding edge source code is no longer available to the public I guess:
http://sourceforge.net/project/stats/detail.php?group_id=1897&ugn=ghostscript&type=cvs&mode=12months&year=2010

And forum activity is very poor:
http://sourceforge.net/project/stats/detail.php?group_id=1897&ugn=ghostscript&type=forum&mode=60day&forum_id=0
(low frequency forums/mailing lists have the disadvantage that users can't help
each other since there are just too few of them)

Thanks again so far for your valuable suggestions.

Comment 9 Tobias hain 2010-01-05 01:30:27 UTC

Sorry, I just recognized the updated source code location:
http://code.google.com/p/ghostscript/source/checkout

Comment 10 Ray Johnston 2010-01-05 07:03:42 UTC

The bleeding edge source code continues to be available via our source code
repository at svn.ghostscript.com (svn).

You can set up a local version using:

svn checkout http://svn.ghostscript.com/ghostscript/trunk/gs my_local_gs

where 'my_local_gs' is the directory for the top level gs

then:

svn update

will update your local gs sources to the latest and greatest.

The active discussions are mostly on IRC at irc.ghostscript.com #ghostscript
and on the gs-devel mailing list (see ghostscript.com for links.

I'll keep this open to remind me to collect some of the 'how to' in a doc.

Thanks for your willingness to change your modus operandi to get the best
performance. I'll also still look into the answer to the font cache issue.

Comment 11 Ray Johnston 2010-04-25 23:11:26 UTC

We probably won't be able to dig into the font cache performance issue anytime
soon, so I am closing this bug. A question to gs-devel or #ghostscript IRC
might help, but I suspect that performance analysis will need to be done by
the submitter or someone else that cares about this.

Sometimes performance can be enhanced by turning off garbage collection (with
-dNOGC) and disabling IdiomRecognition with: 
   -c "<< /IdiomRecognition false >> setuserparams"

One final note before closing this bug: Multiple core rendering _may_ allow
some performance enhancement. We have found that the 'PNG' encoding is a
significant performance hit, so scattering pages of 'raw' raster (on a ram
disk) to each be converted to PNG may help utilize parallel CPU's. If this
helps, and multiple CPU's are desired to render each page faster, then the
-dNumRenderingThreads= parameter can be used (probably with -dMaxBitmap= and
-dBufferSpace= parameters to force clist rendering into an appropriate number
of bands so that each band is one CPU. It is important that the PNG compression
be run in more than one CPU since this is often a bottleneck, so multi-threaded
rendering from the clist doesn't really help.

Closing since we have improved things quite a bit and this is really seems to
be a specific issue for this user.