Bug 690131 - Optimize consecutive setpagedevice
Summary: Optimize consecutive setpagedevice
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Graphics Library (show other bugs)
Version: master
Hardware: PC All
: P3 enhancement
Assignee: Nancy Durgin
URL:
Keywords: discuss
Depends on:
Blocks:
 
Reported: 2008-10-21 11:40 UTC by leonardo
Modified: 2021-10-31 10:06 UTC (History)
2 users (show)

See Also:
Customer:
Word Size: ---


Attachments
patch.txt (6.40 KB, text/plain)
2008-12-12 01:04 UTC, leonardo
Details
patch2.txt (14.33 KB, patch)
2008-12-18 22:43 UTC, leonardo
Details | Diff
gdevfpdp.c (30.40 KB, patch)
2008-12-18 22:46 UTC, leonardo
Details | Diff
gdevfpdp.h (749 bytes, patch)
2008-12-18 22:47 UTC, leonardo
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description leonardo 2008-10-21 11:40:58 UTC
Need to skip redundant erasepage. A suggestion is to change setpagedevice with 
installing a forwarding device that thunks all drawing methods and removes 
itself on any undegenerate painting. Ray says type1-ce1-setcurrentpoint is 
useful for debugging.
Comment 1 leonardo 2008-10-21 11:42:27 UTC
See also Bug688584.ps
Comment 2 leonardo 2008-11-17 07:29:25 UTC
Rather we can install a special device on erasepage, a proper uninstalling 
appears problematic because low level device methods (such as fill_rectangle) 
don't receive imager stgate. Postscript interpreter usually calls a high level 
method before calling a low level method, but we're not sure thyat we should 
assume it in general.

Another idea is a change to clist writer which checks whether the last command 
is filling same rectangle with same color and idempotential rop. Need to see 
whether we can easy retrieve the last command stored to a band (or a band 
group).
Comment 3 leonardo 2008-11-17 07:45:40 UTC
We could retrieve the last cmd_list command after adding a new field 
cmd_list::last_command_length. IMO such change is pretty safe mand doesn't 
spend much memory.
Comment 4 Ray Johnston 2008-11-17 08:55:43 UTC
I think the simplest and least prone to problems is the method sketched on an
IRC discussion -- erasepage would switch (some of) the device procs (those
that paint data) to an 'intercept' function that would switch the functions
back when non-background is painted. If erasepage is called again before the
procs are switched back, it would be ignored.

Installing a special forwarding device in the chain is problematic because we
have no way to remove forwarding devices from a chain (since we don't have
'reverse' pointers to all of the places that may have saved the pointer to the
forwarding device), so even if the forwarding device is extremely simple, it
still presents a (arguably minor) performance impact that the 'proc switching'
method above avoids.
Comment 5 Ray Johnston 2008-11-17 08:57:51 UTC
Note that the switching device procs would also work whether the underlying
device was in page buffer mode or clist mode. Monochrome printer RIPS will
typically use page buffer mode and it is desirable to prevent extra memory
fills in this mode as well as in clist mode.
Comment 6 leonardo 2008-11-17 10:38:13 UTC
Where to store the original procs for switching back ? Ray, do you mean to add 
a new field to gx_device for the procs ? Note we already have got 
static_procs, which looks being used for a different purpose.
Comment 7 leonardo 2008-11-17 10:57:15 UTC
Regarding Comment #4 paragraph 2 : IMO there are few places for the device 
pointer : imager state, another forwarding device, text enumerator and image 
enumerator. The last 2 ones are not important because *begin* procedures will 
remove the erasepage optimizer device. So only place is imager state, which 
may have multiple instances. One could remove the erasepage optimizer device 
from an instance of the imager state when a high level device method is called 
with that imager state. That's why I'm aware about other interpreters which 
possibly skip high level method calls. The graphics library could remove the 
erasepage optimizewr device when the library installs another forwarding 
device such as clipper.
Comment 8 leonardo 2008-11-20 06:40:20 UTC
BTW, do we still want a new device method for erasing a page ? It would be 
natural to define it for this problem and handle it especially in 
gx_device_printer. It is also useful for pdfwrite, ps2write. Other gx_device 
subclasses don't need the erasepage optimization, so only gx_device_printer 
will need a field for saving procs.
Comment 9 Ray Johnston 2008-11-20 09:07:59 UTC
Regarding comment #6, while we could 'hide' the pointer to a save area in one of
the 'secondary' procedure pointers such as 'image_data' which would only be
called after 'begin_image' (thus the pointers would already have been restored
by begin_image), I think this kind of hack is dangerous, so I favor having a
separate pointer added to hook a 'saved procs' area.

Also adding a 'bool' to mean 'page_is_marked' would be handy, which would be
set when the hooked procs 'unhook' themselves.

Regarding comment #8, I have no objection to adding an 'erasepage' device
proc that the new code would call only once at the start of the page. IIRC,
this would allow removing some kloodgier methods in pdfwrite and the display
device.
Comment 10 leonardo 2008-12-10 19:51:08 UTC
Rather the idea to delay erasepage (either with installing a device or with 
replacing the device procs vector) looks attractive, now I can see 3 obstacles 
for implementing it.

1. Need to save a device color in the device structure. It happens because 
various devices use different representations for the gray color 1. For 
example, if UseCIEColor is on, the gray color may be mapped to something 
different than CMYK=(0,0,0,1) and then halftoned. Currently we have no methods 
for saving and restoring device colors (rather parts of it are coded in clist 
and in save_dc implementations fro pdfwrite. A possible workaround is don't 
delay erasepage if the gray device color is not pure.

2. I'm not clear what is the logic of erasepage and fillpage with overprint or 
transparency. I see gs_fillpage is called from an instandard PS 
operator .fillpage, which is never defined. I see .fillpage is only called 
with false overprint from gs/Resource/Init, and likely we can restrict its 
semantics accordingly (not a backward compatible change). Also since erasepage 
isn't a part of PDF, IMO we can restrict it so that it must not be called 
inside a transparency grop, as well as .fillpage . Same constraint must apply 
to functions gs_fillpage, gs_erasepage in *all* languages.

3. While the new device method 'erasepage' woulkd be useful for (1) dalaying 
erasepage for gs_device_printer and (2) for pdfwrite, it needs to pass the 
device color at least (because 1 setgray can't impl;ement in a device method 
due to no high level color spaces in there). Then it's not 'erasepage' 
but 'fillpage'. But the last one is not useful for pdfwrite, because it needs 
same tricks as fill_rectangle.

Thus the usefullness of this approach appears to be lesser valuable than we 
thought in the beginning, so need to discuss what to do with it.
Comment 11 leonardo 2008-12-10 20:02:25 UTC
BTW our erasepage doesn't follow PLRM in the part of ignoring the clipping 
path. Loks like a separate bug.
Comment 12 leonardo 2008-12-11 17:37:08 UTC
There are 2 wayus to work around the problem (1) in Comment #10 :

1. Instead dealying erasepage, perform the first fillpage and ignore 
consecutive ones that go with same color and same page size. Use save_dc to 
now whether subsequent calls go with same color. It's not a perfect 
optimization when the page size changes. Also it doesn't skip erasepage of the 
last showpage.

2. Use more knowledge about the document structure. With DSC ignore eraepage 
between %%BeginPageSetup and %%EndPageSetup and perform erasepage on %%
EndPageSetup. With PDF ignore erasepage until page content stream is started, 
and execute one immediately before the stream is started. Postscript with no 
DSC is not optimized.
Comment 13 leonardo 2008-12-12 01:04:07 UTC
Created attachment 4655 [details]
patch.txt

A trial patch for delaying fillpage in PDF interpreter. It needs improvements
due to regressions. Still thinking whether to complete it.
Comment 14 leonardo 2008-12-18 22:43:07 UTC
Created attachment 4661 [details]
patch2.txt

This is trial patch for delaying fillpage with replacing the virtual device
method vector as suggested by Ray. It must not go to production due to
regressions. It's implementation ndiscovered a serious problem. I'd like to
hear suggestions how to resolve it. See log message below for details.

[Log message beg]
Fix (PDF interpreter) : Avoid extra erasepage, part 2.

DETAILS :

This is the second partial fix for the 
bug 690131 "Optimize consecutive setpagedevice".

The patch introduces a new implementation for 
the virtual device method 'fillpage'.
The idea of the new implementation is to delay the page filling until
an untrivial painting happens to the page. It would allow
to skip extra 'fillpage' when it go immediately after another fillpage.

For doing it, we replace the device virtual methods vector
with a vector of special functions,
in which 'fillpage' just saves the color and returns,
and other methods call the original 'fillpage' delayedly
and trigger out the delay state with restoring the original method.
The idea of replacing the virtual method vector belongs to Ray.
The new field save_procs_while_delaying_fillpage works for saving
the original vector. Due to some reasons explained below,
we implement it for printer devices only.

However this beautiful idea meets several obstacles,
so that the working logic isn't so simple, as explained below.

First, since the next painting after 'fillpage'
may be a low level method, we meet a problem with passing
a high level color to fillpage. Due to that high level
devices cannot use the delayed logic, and they still
need to skip extra fillpage by own means.

Second, when working with clist, we can't simply
replace the clist_fillpage method with a delayed implementation,
because the clist writer combines printer methods with clist methods.
Actually a clist writer inherits methods from 2 base classes.
Nevertheless the 'fillpage' method must be inherited
from both classes, and synthesized from them so as 
first the delayed logic works, and then the writing works.
For combining the 2 methods we added a new field
gx_device_printer::delayed_fillpage .

Third, we can't simply replace entire virtual vector,
because sometimes it is used to know what subclass
is a device belongs to. Since we have no a reasonable constraint
when it may happen, we don't replace the open_device method,
which in most cases is used for such recognition.

At last, we met one problems, which looks insoluble.
We call it a "dynamic subclassing conflict".
Sometimes the old code replaces virtual methods to a device,
and such replacement really cause conflicts with
the replacement by the delaying logic. Actually there are
3 variants of such conflict : (1) a real replacement,
(2) copying a method to another variable and later calling it
through that variable, and (3) comparing a virtual method
with a specific one (for example, with the default implementation)
to know about a specific feature of the device.

Here is an example of a problem caused by (2).
When applied the patch to revision 9294, set breakpoints as this : 

gxclrect.c ln 279
gdevddrw.c ln 958
gdevprn.c ln 1340
gdevfpdp.c ln 800 (a new module added)
gdevpbm.c ln 235

When running this command line :

gswin32.exe -IF:/AFPL/gs-hd/Resource/Init;f:\afpl\fonts -r300 -dNOPAUSE -dBATCH
-sDEVICE=ppmraw -dLastPage=2 -sOutputFile=p%%d.ppm
H:\AuxFiles\comparefiles\0.pdf

1. first it breaks when begin_typed_image is copied to save_begin_typed_image.
2. It happens 2 times and the last value is clist_begin_typed_image.
3. The next break happens in gdev_prn_fillpage, which replaces the virtual
vector 
with the delayed one. 
4. Then break in ppm_set_dev_procs
shows how begin_typed_image is replaced with pnm_begin_typed_image,
and the delayed vector is now wrong. If begin_typed_image
is called after that, the delayed fillpage would be missed.
But in practice we get a more dangerous thing.
5. The next break in gdev_prn_fillpage is not interesting,
because it simply ignores the extra fillpage. 
6. After that we get a break in end_delay (which is indirectly called from
gs_eofill), 
which restores the original vector including begin_typed_image.
Thus now the gx_device_printer::procs.begin_typed_image points to 
prn_begin_typed_image and gx_device_printer::save_begin_typed_image
points to the delayed method.
7. The next break in clist_fillpage shows that the delayed logic works fine.
8. After that we get break in end_delay, which attempts
to restore the device virtual method at second time.
It is abnormal, but we can work around with resetting open_device in
save_procs_while_delaying_fillpage. Note the abnormal call happens because
prn_begin_typed_image triggers to call the original
device method (which is expected to be a clist method)
via save_begin_typed_image, but it's value points to the delayed method.
Here we fall into an infinite recursion, and 
9. all subsequent breaks happen at same point.
10. all subsequent breaks happen at same point.
11. all subsequent breaks happen at same point.
and so on.

It this point I can't see how to resolve the dynamic subclassing conflict
staying within the idea of replacing virtual vector,
because old code (including customer devices)
use dynamic subclassing without restrictions.
Rather a special workaround (or even several ones)
can be suggested for begin* methods, I don't see
how to resolve it in general, because generally
any device method may be replaced or cloned to a private field.

Thus I stalled at this point.
Suggestions are welcome.

EXPECTED DIFFERENCES :

Regression with 0.pdf .
[Log message end]
Comment 15 leonardo 2008-12-18 22:46:40 UTC
Created attachment 4662 [details]
gdevfpdp.c

A new module missed in patch2.txt .
Comment 16 leonardo 2008-12-18 22:47:42 UTC
Created attachment 4663 [details]
gdevfpdp.h

Another module missed from patch2.txt .
Comment 17 Ray Johnston 2009-11-11 10:09:09 UTC
Performance impact is reduced. Reduce priority and status to 'enhancement'.
Comment 18 Nancy Durgin 2018-08-09 17:45:43 UTC
I implemented a subclass device (called 'epo' for erasepage optimization).  

This device gets installed only if the fillpage proc for the device is gx_default_fillpage.  That means it won't do the optimization for clist devices, pdfwrite and other oddball cases.

It also will bail out of the optimization if the color is not "pure".

The test file Bug690131.ps will test the non-pure-color case if run on a halftone device such as pbmraw.

The actual optimization defers all the erasepage calls (actually anything that goes via gs_fillpage, which seems to be just "erasepage" and ".fillpage") until a marking operation occurs.  It saves the color of the latest fillpage (assuming it is a pure color).

When a marking operation occurs we do a simple fill_rectangle using the saved color, and then uninstall the subclass device.

The reason for restricting the application of the optimization is because a more generic approach requires saving the entire gstate and having that be preserved across save/restore, which turns out to be a big mess to do correctly.

RESULTS:
I found ~40% speedup on pi3 for this run:
bin/gs  -Z: -dMaxBitmap=1000m -sDEVICE=bitcmyk -dGrayValues=256 -r600 -o /dev/null ~/work/samples/tests_private/comparefiles/Bug692364.ps

Similar speedups for many pdf files, such as:
bin/gs  -Z: -dMaxBitmap=1000m -sDEVICE=bitcmyk -dGrayValues=256 -r600 -o /dev/null ~/work/samples/tests_private/compare_files/ADOBE1-4.pdf

30% speedup in my linux VM for this run:
bin/gs  -Z: -dMaxBitmap=1000m -sDEVICE=bitcmyk -dGrayValues=256 -r600 -o /dev/null ~/work/samples/PLRM2.pdf 
(a ~770 page PDF doc, the Postscript reference manual)

PDF docs seem to have 2 fillpages per page (plus a few extras at beginning and end of job), so this optimization basically eliminates half of them.