Summary: | Optimize consecutive setpagedevice | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | leonardo <leonardo> |
Component: | Graphics Library | Assignee: | Nancy Durgin <nancy.durgin> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | CC: | christinedelight.top85, nancy.durgin |
Priority: | P3 | Keywords: | discuss |
Version: | master | ||
Hardware: | PC | ||
OS: | All | ||
Customer: | Word Size: | --- | |
Attachments: |
patch.txt
patch2.txt gdevfpdp.c gdevfpdp.h |
Description
leonardo
2008-10-21 11:40:58 UTC
Rather we can install a special device on erasepage, a proper uninstalling appears problematic because low level device methods (such as fill_rectangle) don't receive imager stgate. Postscript interpreter usually calls a high level method before calling a low level method, but we're not sure thyat we should assume it in general. Another idea is a change to clist writer which checks whether the last command is filling same rectangle with same color and idempotential rop. Need to see whether we can easy retrieve the last command stored to a band (or a band group). We could retrieve the last cmd_list command after adding a new field cmd_list::last_command_length. IMO such change is pretty safe mand doesn't spend much memory. I think the simplest and least prone to problems is the method sketched on an IRC discussion -- erasepage would switch (some of) the device procs (those that paint data) to an 'intercept' function that would switch the functions back when non-background is painted. If erasepage is called again before the procs are switched back, it would be ignored. Installing a special forwarding device in the chain is problematic because we have no way to remove forwarding devices from a chain (since we don't have 'reverse' pointers to all of the places that may have saved the pointer to the forwarding device), so even if the forwarding device is extremely simple, it still presents a (arguably minor) performance impact that the 'proc switching' method above avoids. Note that the switching device procs would also work whether the underlying device was in page buffer mode or clist mode. Monochrome printer RIPS will typically use page buffer mode and it is desirable to prevent extra memory fills in this mode as well as in clist mode. Where to store the original procs for switching back ? Ray, do you mean to add a new field to gx_device for the procs ? Note we already have got static_procs, which looks being used for a different purpose. Regarding Comment #4 paragraph 2 : IMO there are few places for the device pointer : imager state, another forwarding device, text enumerator and image enumerator. The last 2 ones are not important because *begin* procedures will remove the erasepage optimizer device. So only place is imager state, which may have multiple instances. One could remove the erasepage optimizer device from an instance of the imager state when a high level device method is called with that imager state. That's why I'm aware about other interpreters which possibly skip high level method calls. The graphics library could remove the erasepage optimizewr device when the library installs another forwarding device such as clipper. BTW, do we still want a new device method for erasing a page ? It would be natural to define it for this problem and handle it especially in gx_device_printer. It is also useful for pdfwrite, ps2write. Other gx_device subclasses don't need the erasepage optimization, so only gx_device_printer will need a field for saving procs. Regarding comment #6, while we could 'hide' the pointer to a save area in one of the 'secondary' procedure pointers such as 'image_data' which would only be called after 'begin_image' (thus the pointers would already have been restored by begin_image), I think this kind of hack is dangerous, so I favor having a separate pointer added to hook a 'saved procs' area. Also adding a 'bool' to mean 'page_is_marked' would be handy, which would be set when the hooked procs 'unhook' themselves. Regarding comment #8, I have no objection to adding an 'erasepage' device proc that the new code would call only once at the start of the page. IIRC, this would allow removing some kloodgier methods in pdfwrite and the display device. Rather the idea to delay erasepage (either with installing a device or with replacing the device procs vector) looks attractive, now I can see 3 obstacles for implementing it. 1. Need to save a device color in the device structure. It happens because various devices use different representations for the gray color 1. For example, if UseCIEColor is on, the gray color may be mapped to something different than CMYK=(0,0,0,1) and then halftoned. Currently we have no methods for saving and restoring device colors (rather parts of it are coded in clist and in save_dc implementations fro pdfwrite. A possible workaround is don't delay erasepage if the gray device color is not pure. 2. I'm not clear what is the logic of erasepage and fillpage with overprint or transparency. I see gs_fillpage is called from an instandard PS operator .fillpage, which is never defined. I see .fillpage is only called with false overprint from gs/Resource/Init, and likely we can restrict its semantics accordingly (not a backward compatible change). Also since erasepage isn't a part of PDF, IMO we can restrict it so that it must not be called inside a transparency grop, as well as .fillpage . Same constraint must apply to functions gs_fillpage, gs_erasepage in *all* languages. 3. While the new device method 'erasepage' woulkd be useful for (1) dalaying erasepage for gs_device_printer and (2) for pdfwrite, it needs to pass the device color at least (because 1 setgray can't impl;ement in a device method due to no high level color spaces in there). Then it's not 'erasepage' but 'fillpage'. But the last one is not useful for pdfwrite, because it needs same tricks as fill_rectangle. Thus the usefullness of this approach appears to be lesser valuable than we thought in the beginning, so need to discuss what to do with it. BTW our erasepage doesn't follow PLRM in the part of ignoring the clipping path. Loks like a separate bug. There are 2 wayus to work around the problem (1) in Comment #10 : 1. Instead dealying erasepage, perform the first fillpage and ignore consecutive ones that go with same color and same page size. Use save_dc to now whether subsequent calls go with same color. It's not a perfect optimization when the page size changes. Also it doesn't skip erasepage of the last showpage. 2. Use more knowledge about the document structure. With DSC ignore eraepage between %%BeginPageSetup and %%EndPageSetup and perform erasepage on %% EndPageSetup. With PDF ignore erasepage until page content stream is started, and execute one immediately before the stream is started. Postscript with no DSC is not optimized. Created attachment 4655 [details]
patch.txt
A trial patch for delaying fillpage in PDF interpreter. It needs improvements
due to regressions. Still thinking whether to complete it.
Created attachment 4661 [details] patch2.txt This is trial patch for delaying fillpage with replacing the virtual device method vector as suggested by Ray. It must not go to production due to regressions. It's implementation ndiscovered a serious problem. I'd like to hear suggestions how to resolve it. See log message below for details. [Log message beg] Fix (PDF interpreter) : Avoid extra erasepage, part 2. DETAILS : This is the second partial fix for the bug 690131 "Optimize consecutive setpagedevice". The patch introduces a new implementation for the virtual device method 'fillpage'. The idea of the new implementation is to delay the page filling until an untrivial painting happens to the page. It would allow to skip extra 'fillpage' when it go immediately after another fillpage. For doing it, we replace the device virtual methods vector with a vector of special functions, in which 'fillpage' just saves the color and returns, and other methods call the original 'fillpage' delayedly and trigger out the delay state with restoring the original method. The idea of replacing the virtual method vector belongs to Ray. The new field save_procs_while_delaying_fillpage works for saving the original vector. Due to some reasons explained below, we implement it for printer devices only. However this beautiful idea meets several obstacles, so that the working logic isn't so simple, as explained below. First, since the next painting after 'fillpage' may be a low level method, we meet a problem with passing a high level color to fillpage. Due to that high level devices cannot use the delayed logic, and they still need to skip extra fillpage by own means. Second, when working with clist, we can't simply replace the clist_fillpage method with a delayed implementation, because the clist writer combines printer methods with clist methods. Actually a clist writer inherits methods from 2 base classes. Nevertheless the 'fillpage' method must be inherited from both classes, and synthesized from them so as first the delayed logic works, and then the writing works. For combining the 2 methods we added a new field gx_device_printer::delayed_fillpage . Third, we can't simply replace entire virtual vector, because sometimes it is used to know what subclass is a device belongs to. Since we have no a reasonable constraint when it may happen, we don't replace the open_device method, which in most cases is used for such recognition. At last, we met one problems, which looks insoluble. We call it a "dynamic subclassing conflict". Sometimes the old code replaces virtual methods to a device, and such replacement really cause conflicts with the replacement by the delaying logic. Actually there are 3 variants of such conflict : (1) a real replacement, (2) copying a method to another variable and later calling it through that variable, and (3) comparing a virtual method with a specific one (for example, with the default implementation) to know about a specific feature of the device. Here is an example of a problem caused by (2). When applied the patch to revision 9294, set breakpoints as this : gxclrect.c ln 279 gdevddrw.c ln 958 gdevprn.c ln 1340 gdevfpdp.c ln 800 (a new module added) gdevpbm.c ln 235 When running this command line : gswin32.exe -IF:/AFPL/gs-hd/Resource/Init;f:\afpl\fonts -r300 -dNOPAUSE -dBATCH -sDEVICE=ppmraw -dLastPage=2 -sOutputFile=p%%d.ppm H:\AuxFiles\comparefiles\0.pdf 1. first it breaks when begin_typed_image is copied to save_begin_typed_image. 2. It happens 2 times and the last value is clist_begin_typed_image. 3. The next break happens in gdev_prn_fillpage, which replaces the virtual vector with the delayed one. 4. Then break in ppm_set_dev_procs shows how begin_typed_image is replaced with pnm_begin_typed_image, and the delayed vector is now wrong. If begin_typed_image is called after that, the delayed fillpage would be missed. But in practice we get a more dangerous thing. 5. The next break in gdev_prn_fillpage is not interesting, because it simply ignores the extra fillpage. 6. After that we get a break in end_delay (which is indirectly called from gs_eofill), which restores the original vector including begin_typed_image. Thus now the gx_device_printer::procs.begin_typed_image points to prn_begin_typed_image and gx_device_printer::save_begin_typed_image points to the delayed method. 7. The next break in clist_fillpage shows that the delayed logic works fine. 8. After that we get break in end_delay, which attempts to restore the device virtual method at second time. It is abnormal, but we can work around with resetting open_device in save_procs_while_delaying_fillpage. Note the abnormal call happens because prn_begin_typed_image triggers to call the original device method (which is expected to be a clist method) via save_begin_typed_image, but it's value points to the delayed method. Here we fall into an infinite recursion, and 9. all subsequent breaks happen at same point. 10. all subsequent breaks happen at same point. 11. all subsequent breaks happen at same point. and so on. It this point I can't see how to resolve the dynamic subclassing conflict staying within the idea of replacing virtual vector, because old code (including customer devices) use dynamic subclassing without restrictions. Rather a special workaround (or even several ones) can be suggested for begin* methods, I don't see how to resolve it in general, because generally any device method may be replaced or cloned to a private field. Thus I stalled at this point. Suggestions are welcome. EXPECTED DIFFERENCES : Regression with 0.pdf . [Log message end] Created attachment 4662 [details]
gdevfpdp.c
A new module missed in patch2.txt .
Created attachment 4663 [details]
gdevfpdp.h
Another module missed from patch2.txt .
Performance impact is reduced. Reduce priority and status to 'enhancement'. I implemented a subclass device (called 'epo' for erasepage optimization). This device gets installed only if the fillpage proc for the device is gx_default_fillpage. That means it won't do the optimization for clist devices, pdfwrite and other oddball cases. It also will bail out of the optimization if the color is not "pure". The test file Bug690131.ps will test the non-pure-color case if run on a halftone device such as pbmraw. The actual optimization defers all the erasepage calls (actually anything that goes via gs_fillpage, which seems to be just "erasepage" and ".fillpage") until a marking operation occurs. It saves the color of the latest fillpage (assuming it is a pure color). When a marking operation occurs we do a simple fill_rectangle using the saved color, and then uninstall the subclass device. The reason for restricting the application of the optimization is because a more generic approach requires saving the entire gstate and having that be preserved across save/restore, which turns out to be a big mess to do correctly. RESULTS: I found ~40% speedup on pi3 for this run: bin/gs -Z: -dMaxBitmap=1000m -sDEVICE=bitcmyk -dGrayValues=256 -r600 -o /dev/null ~/work/samples/tests_private/comparefiles/Bug692364.ps Similar speedups for many pdf files, such as: bin/gs -Z: -dMaxBitmap=1000m -sDEVICE=bitcmyk -dGrayValues=256 -r600 -o /dev/null ~/work/samples/tests_private/compare_files/ADOBE1-4.pdf 30% speedup in my linux VM for this run: bin/gs -Z: -dMaxBitmap=1000m -sDEVICE=bitcmyk -dGrayValues=256 -r600 -o /dev/null ~/work/samples/PLRM2.pdf (a ~770 page PDF doc, the Postscript reference manual) PDF docs seem to have 2 fillpages per page (plus a few extras at beginning and end of job), so this optimization basically eliminates half of them. |