Bug 687561

Summary: Smaller PDFs when using execform
Product: Ghostscript Reporter: SaGS <sags5495>
Component: PDF WriterAssignee: Ken Sharp <ken.sharp>
Status: RESOLVED FIXED    
Severity: enhancement CC: alex, christinedelight.top85, shailesh.mistry
Priority: P3 Keywords: bountiable
Version: master   
Hardware: PC   
OS: All   
Customer: Word Size: ---
Attachments: Short test file.
Suggested patch.
Patch modified for OrderResources.

Description SaGS 2004-07-05 10:29:38 UTC
With the current GhostScript version (8.30), there is no special 
processing for execform. For each call, the marking operations in 
PanintProc are convered to PDF operators and included directly in 
the page stream. If the same form is painted multiple times (as is 
the case for logos/ watermarks/ etc), the equivalent PDF operators 
are included each time, resulting in a larger than needed. Not even 
an external tool can successfully detect and "collapse" the 
multiple occurences of the same form.

The proposed patch introduces a new implementation of .execform1 
specifically for PDF conversion. This new implementation is 
installed by .setpdfwrite, so normally won't affect output if 
using devices other than pdfwrite. If, however, a user executes 
.setpdfwrite with a different device (or changes the device 
afterwards), this implementation works as a "bare-bones" one (no 
caching) and the PostScript file still renders fine. The only loss 
may be some speed, because the implementation that does "caching 
using patterns" is unistalled (is gs_fform.ps used?).

The new .execform1 automatically calls pdfmark to convert the form 
into a separate Form XObject. Starting with the 2nd rendering of 
the form, only an /SP pdfmark is called, gaining both processing 
speed and output file size.

Some forms and some usage patterns produce better result than 
others, as described in the following paragraphs.

(A) Forms with a XUID
For these forms a single Form XObject is created, no matter how 
many times the form is loaded into VM, and no matter if it placed 
into local or global VM. .execform1 derives the xobject name passed 
to /BP pdfmark from the form's XUID; then, if the form is loaded 
into VM multiple times, it relies on the proposed patch for 
bug 687560 "Invalid PDF if /BP pdfmarks with non-unique /_objdef" 
to collapse the XObjects that correspond to th same form (the 
"same /_objdef == same XObject policy") into a single one.

(B) Forms without an XUID
For all form without an XUID, the /_objdef name is derived from a 
global counter. As long as the form dictionary does not loose 
its /Implementation key, .execform1 won't generate an extra 
Form XObject. 

(B1) Forms loaded into global VM won't loose this key unless the 
form is unloaded and reloaded, so the output PDF will normally 
contain a single copy of it.

(B2) Forms in local VM may loose the /Implementation after a 
restore, and a new XObject will be generated the next time the 
form is painted. Note: is such a case, the read/only attribute of 
the form dictionary is removed too, so it can be changed 
(/PaintProc replaced, etc). Any "caching" info has to be discarded, 
and a trick like the redefinition of restore in gs_fform.ps is 
not desirable ans may produce incorrect output.
Comment 1 SaGS 2004-07-05 10:41:15 UTC
Created attachment 791 [details]
Short test file.

It's the one from bug 687560 "Invalid PDF if /BP pdfmarks with 
non-unique /_objdef", rewritten to use PostScript forms instead 
of /BP-/EP-/SP pdfmarks.
Comment 2 SaGS 2004-07-05 10:41:59 UTC
Created attachment 792 [details]
Suggested patch.

For this patch to work fully, the patch for the aforementioned 
bug must be applied too. I didn't include any changes to the docs.
Comment 3 SaGS 2004-08-29 14:20:01 UTC
Created attachment 869 [details]
Patch modified for OrderResources.

At present, OrderResources == true does not correctly handle 
named objects, and these are written at the very end of the output 
PDF. This makes Form XObjects unavailable when the page 
referencing them is printed. For this reason, I modified 
execform1_xobj to deactivate all its "caching using Form XObjects" 
when OrderResources is true.
Comment 4 Igor Melichev 2005-03-30 13:09:51 UTC
*** Bug 688001 has been marked as a duplicate of this bug. ***
Comment 5 Igor Melichev 2005-03-30 13:15:17 UTC
Here are some considerations about forms.

1. 'execform' must tell to the device interface about starting and ending a 
form.

2. Since the Postscript form procedutre may depend on context, each execution 
to be accumulated as a stream resource. Then equal resources to be merged. 
(This technique has been implemented for charproc variations.)

3. Doing (3), the CTM to be factored out. Consequently CTM to be passed in (1).

Please don't expect a quick resolution. We're busy with higher priority 
projects.
Comment 6 leonardo 2007-08-29 21:32:25 UTC
It needs new special PS operators for .startform amd .endform, then a signal to 
pdev->procs.pattern_manage, then an accumulating a substream in pdfwrite. The 
last is the most difficult part, so passing it to Ken whi handles pdfwrite.
Comment 7 Shailesh Mistry 2011-07-12 19:20:28 UTC
Enhancement still missing in Ghostscript 9.03
Comment 8 Ken Sharp 2013-09-26 09:00:09 UTC
I believe commit c561232cf26e060b89fc4f3bd4bf5c679731d4db will resolve this. Interestingly, Adobe Acrobat Distiller cannot process this test file very efficiently. Each invocation of the form resource ends up as a separate form XObject in the PDF file.

pdfwrite only emits one form XObject, so we are more efficient....

Note that any form which relies on the CTM having particular settings will fail with this commit, because we reset the CTM in order to capture the form stream without the CTM baked in.

Two Quality Logic files break this, one uses setgstate inside the form to set the gstate to a stgate before the form was executed (this is of course completely barking mad). The other uses setflat and bases the value on the CTM at the time the form is executed. Because we reset the CTM for the course of the form, the setflat value is only appropriate for low resolution.

For these cases a new switch -dUNROLLFORMS will prevent pdfwrite attempting to preserve the form resource. The result will be correct, at the cost of the output being larger of course.