If passing the same object name (the value corresponding to the /_objdef key) to 2 or more /BP pdfmarks, the PDF file produced is invalid. For each /BP call, a PDF object number is alocated, and (at /SP time) that copy is referenced from a /Resources dictionary. But, except the last copy, those objects are not written to the output file and the corresponding xref entries are invalid (neither marked as free, nor pointing to valid objects). Reusing object names passed to pdfmark seems to be neither explicitely allowed nor explicitely disallowed, but both Acrobat Distiller and Jaws PDf Creator accept this. The attached patch fixes this as follows: - pdfmark_BP() explicitely verifies if the object name is a duplicate; if not, it works as before (and there will be no change in the generated PDF files compared to existing version). - if a name is reused, reads the old objects's # (plus some other info) for reusing it with the object that will be created. - the old copy of the object is freed during the call to pdf_enter_substream() (more precisely when the new (name, object) pair is added to pdev->local_named_objects), so even if the object # is reused, there won't be 2 PDF objects with the same # written to the output PDF. There's no change here relative to existing code. - the new object is created/written normally, the only difference being the object # which is not newly allocated. ------- With the proposed patch, the "same name" is treated as a hint the Form XObjects are the same and only one copy is written to the output PDF. This is an enhancement, allowing to considerably reduce the PDF file size when using logos, watermarks, and other repeated elements. When the obj name is the same but the marking operations are different it does not produce the result someone may expect, but I think this case should be considered an abuse. Such an optimization cannot be done by other means. Currently, GS does not detect duplicate objects. Even if it tried to compare them, duplicates that are translated/ rotated/ etc would not be detected, because the coordinates written in the PDF file are in PostScript device (not user) space, making the content streams appear to be different. Here is how this enhancement can be put to work: Developers generating their own PostScript code ------- This is the most obvious use: enclose the repeated code between a /BP pdfmark call and /EP+/SP calls, using the same name for all copies of the same logo/ watermark/ etc (and different names for different images, of course). A single question remains: why on earth would anyone duplicate the PostScript code that actually paints marks on the page, when starting with the 2nd occurence a single /SP call, which has to be included anyway, will suffice? (and, wouldn't this also make the PostScript file unnecessarily large?) Yes, the PostScript file is larger, but the SAME file is suitable both for printing and for converting to PDF. When printing, an /SP pdfmark only (for all occurences except the 1st) is not sufficient because the logo/ etc would appear only once. When converting to PDF, the pdfmarks and their special processing introduced with this patch are necessary in order to get a (sometimes much) smaller PDF. No need for special-casing printing versus conversion to PDF; and no unpleasant surprises if the user reuses the PosctScript file for a different purpose (print/ convert) that it was intended for. EPS files ------- A simple change to an EPS file allows it to automatically convert to a Form XObject - only one copy of the XObject will be generated for a PDF file, even if the EPS is included multiple times. The change is as follows: - at the beginning of the EPS, before any existing PostScript code, include something similar to the following: userdict /EPSX::status /pdfmark where { pop [ /_objdef {THISEPSOBJNAME} /BBox [LLX LLY URX URY] /BP pdfmark gstate }{ null } ifelse put THISEPSOBJNAME is a name for the created Form XObject, unique for this EPS. I find that using the EPS file's name is a good idea. LLX LLY URX URY are the coordinates of the EPS's bounding box, taken directly form the %%BoundingBox line. - at the end of the file, after any existing PostScript code, add something similar to the following: userdict /EPSX::status get dup null eq { pop }{ [/EP pdfmark gsave setgstate [{THISEPSOBJNAME} /SP pdfmark grestore } ifelse THISEPSOBJNAME is the same name as used at beginning of the file. No special tools or tricks are needed to place such an EPS file on the page. EPS files revisited ------- I'm 99.99% sure that detection of embeded EPS files and insertion of pdfmarks as shown above can be done automatically, using the /ParseDSCComment user parameter. (The missing 0.01% come from not having done it yet...) PostScript Forms ------- At last, but not the least, this enhancement can be extended to PostScript forms. See 687561 "Smaller PDFs when using execform" (which is not a bug, only an enhancement).
Created attachment 788 [details] Short file to reproduce the problem.
Created attachment 789 [details] Suggested patch. Relative to GS8.30 sources. Includes fixes for the typos mentioned in bug 687430 "Wrong transformation matrix with Form XObjects", comment #3.
Created attachment 790 [details] An usefull abuse. Auto-numbering pages, with a twist.
I don't approve the patch due to the trick with pdf_set_xref. From a I brief review I conclude that it will create dangled (unreferred) objects in the output PDF. Also I don't like to change pdf_enter_substream, because there exist a related priority problem to provide a consecutive file writing in all circumstances, but currently OrderResources=true requires a positionability. Adobe pdfmark Reference Manual allows multiple appearances of the object name in objdef, but they to be concerned as a reference to the existing object. Since the object stream is allowed to appear after the object is defined, we must delay the object writing untill that event. So objdef must just create a cos dict without writing to PDF, as we do for ExtGState, and delay the writing untill either the source document end, or the stream appears. If another stream appears for the same object, report an error. Dear SaGS, I'd appreciate if you make a better patch. Unfortunately now we have no time for this problem. Sorry for that, and thank you.
downgrading priority
FIRST, A FEW CLARIFICATIONS ABOUT THE PREVIOUS PATCH > "the trick with pdf_set_xref" I think there's a misunderstanding here, mainly because of an unfortunate choice for the name and the comment. pdf_set_xref() does nothing more than pdf_ref_obj(), which is normally called. The only difference between the 2 is how they get the object number: pdf_ref_obj() calls pdf_next_id() to allocate a new object number, while pdf_set_xref() receives the number as a parameter. It does not write to the output PDF, but to a temporary file holding the (temporary) xref. > "it will create dangled (unreferred) objects in the > output PDF" Certainly not. Internaly, there are multiple object created with the same number. However, eact time the number is reused, the old object is freed. As I already mentioned, this happens when the new copy is added to pdev->local_named_objects and nothing changed here relative to the version before the patch. In fact, deletion of the previous copy is the cause of the bug in the first place, because it happened in a harmfull way. Unreferrenced objects could appear if the "old copy" were already written to the output PDF. However, named objects are written only by /NamespacePop and at end of job, and in both cases the written objects are removed from pdev->local_named_objects so pdfmark_BP() will never find them as "old copies". > "as we do for ExtGState" If I understood correctly, that method is as follows: - start by creating a "ghost" object without a number - fill-in the object - when about to end the creation of the object, decide if it is to be preserved or not; this can be done, for example, by comparing it to existing objects in order to collapse identical objects. - if it has to be preserved, assing it an ID (= a number); otherwise, simply discard the "ghost". This works perfectly with UNNAMED objects. However, it dooes not work so well with NAMED objects. First, it cannot handle forward references and the generic objects created when these are encountered, because this method always creates NEW objects, never REPLACES nor "CONTINUES" existing ones. Secondly, objects create by /OBJ and /BP may be referenced WHILE they are created; not having a number during this time, this is made impossible. The worst case is with /OBJ, because creating those objects ends only at end of job; this method makes ALL references to them, which have to appear DURING the job, impossible. > "If another stream appears for the same object, > report an error." Do you suggest NOT to accept creating 2 objects with the same distill-time name? (1) Other PDF converters, including Acrobat Distiller, do accept this case. (2) There's a much bigger fish to catch here, see the second part of comment #1 and the companion bug 687561 "Smaller PDFs when using execform". I do not really expect someone to write GS-specific code in order to get optimized PDFs by exploiting the "same name = same object" policy introduced by this patch (although, in a controlled environment, this is possible). But GS can use this internally, as mentioned in "EPS files revisited" above and in bug 687561. (3) A practical point of view: getting NO PDF instead of a DAMAGED PDF does not help get the work done. ----------------------------------------------------------------- NEXT, WHAT'S DIFFERENT WITH THE NEW PATCH - It implements 3 different ways to deal with name collisions, denoted in code by PDF_NAMECOLLISION_REBUILD, PDF_NAMECOLLISION_KEEPOLD ("as we do for ExtGState"-alike) and PDF_NAMECOLLISION_BACKUP. Details on these later. Not all of them are currently used, but these are provided for future developements. - There was a bug in the old patch in the way the "resource" structure was (mis-)handled. It was possible to get /Resource dictionaries with duplicate keys, like in /Resource << /XObject << /R7 7 0 R /R7 7 0 R % note: same key (and ref to same object) >> >> - /BP and /SP are able to handle the generic objects created when forward references were encountered. (The change to /SP is not strictly connected to this bug.) - A previous patch (the one that introduced start_XObject()) made this bug appear with /PS pdfmarks too. The fix is the same as for /BP. - That same patch has a small error: pdfmark_PS() may call start_XObject() with objname == NULL, in which case "objname" must not be dereferenced (as in "objname->data/size") and no attempt must be made to add the object to pdev->local_named_objects (because it's NOT NAMED). - It's against the current (at the time of this writing) HEAD. ----------------------------------------------------------------- SOME DETAILS ABOUT THE NEW PATCH Most of the changes are in start_XObject(). Its code is structured in 3 parts: (1) name-conflict detection and specification of applicable methods in "oper" - the 1st 'switch", (2) restrictions coming from options/etc and managing priorities, (3) implementation of the various methods - the 2nd, big "switch" If specific GS options are not compatible with one or the other of the methods, or for changing priorities when more that 1 method could be used, it's easy to chage part (2). There are 3 main methods to deal with "true" name collisions, denoted by PDF_NAMECOLLISION_REBUILD, PDF_NAMECOLLISION_KEEPOLD, and PDF_NAMECOLLISION_BACKUP. There's a special method to efficiently handle generic objects created by forward references, denoted by PDF_NAMECOLLISION_FWDOBJ; this is not a real name collision, but appears as such because generic named objects are created and inserted into pdev->local_named_objects. PDF_NAMECOLLISION_NONE and PDF_NAMECOLLISION_NONAME are for "no name collision" (the 2nd being handles unnamed objects). PDF_NAMECOLLISION_ERRCODE, PDF_NAMECOLLISION_ERRTYPE, and PDF_NAMECOLLISION_ERRRANGE are for returning errors. There is some similar code appearing in 3 of the cases; I prefered not to factor it out, because doing so would make it hard to read (too many "if"s). Details on the various methods follow: PDF_NAMECOLLISION_REBUILD ------- User's perspective: All references to a particular object name are resolved to the LAST among all objects defined with that name. Only this last copy is written to the output PDF, resulting in an output file as small as it can be. Compatibility: Jaws PDF Creator 2.11 Implementation: To avoid extra changes to pdf_enter_substream(), this case is reduced to PDF_NAMECOLLISION_FWDOBJ - a generic object is created as if encountering a forward reference. The "resource" and the object number are reused. The old copy is deleted during the call to pdf_create_named() (more precisely when the new copy is inserted into pdev->local_named_objects), so it is not written to the output PDF. Necessary changes (outside gdevpdfm.c::start_XObject): - pdf_obj_ref_reused(), which is similar to pdf_obj_ref() except it gets the object number as a parameter, instead of getting it from pdf_next_id(); - all changes for PDF_NAMECOLLISION_FWDOBJ. PDF_NAMECOLLISION_KEEPOLD ------- User's perspective: All references to a particular object name are resolved to the FIRST one among all objects defined with that name. Only this first copy is written to the output PDF, resulting in an output file as small as it can be. Compatibility: Acrobat Distiller 4 (and 5 too, I think) Implementation: Uses pdf_enter_substream() with reserve_object_id == false to create a "ghost" object without an ID. At /EP time, this object is discarded. Because of the object not having an ID, this method has a weakness. Consider the following code: [/_objdef {ANAME} .. /BP pdfmark ... building first copy ... [/EP pdfmark [/_objdef {ANAME} .. /BP pdfmark ... building second copy ... (second) show [{ANAME} ... /PUT pdfmark [/EP pdfmark Content added during "... building second copy ..." using PostScript marking operators (like "show" in this example) goes into the 2nd copy and ends up being discarded; content added using pdfmarks (like a /Group entry to control transparency) goes to the 1st copy and is written to the output PDF, which is inconsistent at least. Since the 2nd copy does not have a number, it cannot be referenced so it MUST NOT be added to pdev->local_named_objects, and, as a consequence, it cannot be found by /PUT. This weakness is intrinsic to pdf_enter_substream() with reserve_object_id == false, and is the reason I didn't favor this method. Necessary changes (outside gdevpdfm.c::start_XObject): - pdfmark_EP() has to discard the "ghost" object. PDF_NAMECOLLISION_BACKUP ------- User's perspective: All objects, even those having the same name, are preserved and written to the PDF file, as if they had unique names. Each reference is resolved to the most recent copy of the object; "most recent" refers to the execution order, not source code order. I didn't favor this method because it doesn't help reduce the ouput file's size. Compatibility: Jaws PDF Creator 3.3 Implementation: "Backup" previous copy by dinamically renaming it, then create a new object normally. Necessary changes (outside gdevpdfm.c::start_XObject): - pdf_unique_objname() to generate unique names; - cos_dict_rename() to "rename" pdev->local_name_objects entries. PDF_NAMECOLLISION_FWDOBJ ------- False name collision due to a forward reference. Implementation: pdf_alloc_aside() "continues" with the generic object from where the call to pdf_refer_named() from pdfmark_next_object() left off; this object is passed from start_XObject(). Necessary changes (outside gdevpdfm.c::start_XObject): - The pcofwd parameter to pdf_enter_substream(), pdf_open_aside() and pdf_alloc_aside(), which points to the generic object created when the forward reference was encountered. Only pdf_alloc_aside() processes it, the others only pass it along. Note that pdf_alloc_aside() deals with generic object WITH or WITHOUT an associated "resource" structure; those WITH a "resource" are created by pdfmark_SP() when it has to "render" a Form XObject that has not yet been built (a forward reference). PDF_NAMECOLLISION_NONE ------- No name collision. Works as it did before the patch. PDF_NAMECOLLISION_NONAME ------- Not a named object, so no collision possible. Same as PDF_NAMECOLLISION_NONE, except the object is not registered into pdev->local_named_objects. PDF_NAMECOLLISION_ERRCODE ------- PDF_NAMECOLLISION_ERRTYPE PDF_NAMECOLLISION_ERRRANGE Error cases. Return "code", gs_error_typecheck and gs_error_rangecheck respectively. ----------------------------------------------------------------- FINALLY, apologies for spelling errors/etc. My English is not really good...
Created attachment 868 [details] Suggested patch, revised and extended.
Created attachment 1160 [details] "Bug687560.diff" - for archiving Only for archiving, see next comment for a reduced patch. May return to it later, for example when discussing imposition. The perceived functionality (see comment #6 above, the "User's perspective" sections) is same as the previous one, but: - implementation details changed, the main purpose being to avoid modifying pointers set by pdf_enter_substream() (the "surgery"); - names for the PDF_NAMECOLLISION_* constants changed as follows, hope this time they are clear enough: PDF_NAMECOLLISION_NONAME -> PDF_OBJECT_CREATE_UNNAMED PDF_NAMECOLLISION_NONE -> PDF_OBJECT_CREATE_FIRST PDF_NAMECOLLISION_ERRCODE -> PDF_OBJECT_ERRCODE PDF_NAMECOLLISION_ERRTYPE -> PDF_OBJECT_ERRTYPE PDF_NAMECOLLISION_ERRRANGE -> PDF_OBJECT_ERRRANGE PDF_NAMECOLLISION_REBUILD -> PDF_OBJECT_ASSIGN_NEW PDF_NAMECOLLISION_KEEPOLD -> PDF_OBJECT_DISCARD_NEW PDF_NAMECOLLISION_BACKUP -> PDF_OBJECT_CREATE_NEXT PDF_NAMECOLLISION_FWDOBJ -> PDF_OBJECT_FROM_FWDREF - it solves the anomaly with PDF_OBJECT_DISCARD_NEW (ex- PDF_NAMECOLLISION_KEEPOLD) described above (search for "weakness" in comment #6); this is the use of cos_object_t.is_writable, I'll post details only if necessary. - some other minor changes, especially reorganizing some of the code for more clarity.
Created attachment 1161 [details] "Bug687560 - PDF_OBJECT_DISCARD_NEW.diff" Proposed patch, including only the parts that are necessary for 101% Distiller compatibility. The 1% above full compatibility comes from reproducing the following Distiller anomalies: - the "weakness" mentioned in comment #6; - /SP does not accept forward references (= object names that did not yet appeared in a /BP). User's perspective ------- See comment #6, "User's prespective" for PDF_NAMECOLLISION_KEEPOLD. Backward compatibility ------- There is absolutely no change for cases when the current GS version produces valid output (PDF_OBJECT_CREATE_UNNAMED and PDF_OBJECT_CREATE_FIRST cases correspond to the old code). When an {OBJNAME} appears for the 2nd/etc time in a /BP (branch PDF_OBJECT_DISCARD_NEW), or appears in /BP 1st time but after being used as a "forward reference" (PDF_OBJECT_FROM_FWDREF), existing GhostScript version outputs an invalid PDF, so no one successfully used files that are in these situations. This is true independent of how pdfmarks get called: from a single PostScript file, from 2 or more PS files concatenated or otherwise combined, or from PDF files (because PDF file = a PS file [the PDF interpreter which is written in PostScript] which uses an external data file [the PDF]). These considerations are also valid for the "archived" patch attached to comment #8. Implementation ------- Dealing with unnamed objects (PDF_OBJECT_CREATE_UNNAMED branch): no change relative to current GhostScript version, except fixing an error (for details search for "start_XObject() with objname == NULL" in comment #6). First use of an {OBJNAME} in /BP (PDF_OBJECT_CREATE_FIRST branch): works EXACTLY as the current GhostScript version does (and as all other versions of this patch did). Error cases (PDF_OBJECT_ERR*): nothing to comment. PDF_OBJECT_ASSIGN_NEW (ex- PDF_NAMECOLLISION_REBUILD) and PDF_OBJECT_CREATE_NEXT (ex- PDF_NAMECOLLISION_BACKUP): these are alternatives to PDF_OBJECT_DISCARD_NEW, with different perceived behaviors, and are removed them from this reduced patch. Duplicate use of an {OBJNAME} in /BP (PDF_OBJECT_DISCARD_NEW): - for the perceived behavior, see "User's perspective" in the PDF_NAMECOLLISION_KEEPOLD section of comment #6; this is the way Distiller deals with duplicates object names in /BP; - the "weakness" mentioned there is present in Distiller and this patch reproduces it; - see comments in code for implementation details; - one note though: pres->object->id for the temporary stream is used as temporary storage to allow pdfmark_EP() to find the "main" copy of the object. This field is otherwise not used between /BP and /EP (the stream is not accessible except for writing PDF code generated by PostScript marking operators, in particular no references can be created to it); First /BP, but for an {OBJNAME} that previously appeared as a forward reference (PDF_OBJECT_FROM_FWDREF branch): implementation is similar to PDF_OBJECT_DISCARD_NEW. The difference is that pdfmark_EP() does not discard the temporary stream, but merges it's contents into the main object. pdfmark_EP() knows which of the 2 actions ("discard" or "merge") is to be performed based on the sign of the (temporary) value stored into pres->object->id.
Created attachment 1162 [details] "Bug687560 - Distiller 4,05.pdf" PDF produced by Acrobat Distiller 4,05c from the sample file attached to this bug report, comment #1. Note it contains a single Form XObject, the 1st one generated (one of the brown ones, not blue). Note: The file attached to comment #3 "An usefull abuse" would have shown better how various distillers deal with duplicate {OBJNAMES}, but some of these have problem with pdfmarks called from /EndPage...
Created attachment 1163 [details] "Bug687560 - PDF_OBJECT_DISCARD_NEW.pdf" PDF produced by patched GhostScript from the same file. Note: A bit larger than the one produced by Distiller because of an unrelated problem (Distiller recognizes a stroked charpath and uses PDF text operators to render it; GhostScript does not, and generates many PDF movetos/ linetos/ curvetos instead).
Thank you for submitting the improved patch. It looks much better, but still needs improvements. 1. Negative numbers were not used for object ids except -1. Since you never search by a negative id, please assign -1 with a comment. 2. PDF_OBJECT_ASSIGN_NEW appears in a comment without any definition or explanation. 3. The semantics of PDF_OBJECT_DISCARD_NEW is not natural. In most cases an appearence of a new object with same name to be interpreted as changing the referent, i.e. similarly to an assignment operator. I believe that we discussed it many times and agrred. I'm not clear why you miss this feature now. Either restore it or issue an error. 4. In the Comment #9 you refer to ""User's perspective" in the PDF_NAMECOLLISION_KEEPOLD section of comment #6;". I'm not sure that we understand this subject in same way. IMO merging _equal_ objects is a kind of optimization, which might be either delayed or implemented with stream comparizon after a stream is written, as we do for images. Lets delay it for now to simplify the patch. I mean please don't include any effort for it now. If it is important for you, we'll open another bug and work on it separately later.
> 1. Negative numbers were not used for object ids except -1. Since you > never search by a negative id, please assign -1 with a comment. It DOES search by the absolute value of cos_object_t.id, see the "cos_stream_t *pcsmain = ..." line in pdf_exit_XObject() ("LABS(..)"). When cos_object_t.id < 0, the temporary stream is discarded. However, it is still necessary to locate the "main" cos_object_t to reset its ".is_open" flag to "false". So, cannot use -1L. As to commenting this, I find there are already enough comments. The 2 comments following the "case PDF_OBJECT_FROM_FWDREF:" and "case PDF_OBJECT_DISCARD_NEW:" lines in start_XObject() explain the usage of this field. Also, the comment starting with "Invariant for all non-error exits from start_XObject()" at the end of this function explains (again) what's the use of the sign and of the absolute value. ------- > 2. PDF_OBJECT_ASSIGN_NEW appears in a comment without any definition > or explanation. You are right, the line "* - continue as for PDF_OBJECT_ASSIGN_NEW" in start_XObject() is a leftover from the "complete" patch (comment #8) and has to be removed. ------- > 3. The semantics of PDF_OBJECT_DISCARD_NEW is not natural. In most > cases an appearence of a new object with same name to be interpreted > as changing the referent, i.e. similarly to an assignment operator. > I believe that we discussed it many times and agrred. I'm not clear > why you miss this feature now. Either restore it or issue an error. With the risk of repeating myself, I'll state again: PDF_OBJECT_ASSIGN_NEW, PDF_OBJECT_DISCARD_NEW and PDF_OBJECT_CREATE_NEXT are 3 DIFFERENT WAYS of solving THE SAME problem. Currently, there's no absolute need to keep all of them, only one is necessary. Which one? Comment #6 points out with which PS->PDF converter implements each of them. More than once, you made contradictory requests concerning this point. On one hand, you want complete Acrobat Distiller compatibility; see: "1. Please prove that this behavior is same as Adobe Distiller." in http://www.ghostscript.com/pipermail/gs-code-review/2004-November/004706.html and "Also the patch must be tested for full compatibility with Adobe." in http://www.ghostscript.com/pipermail/gs-code-review/2004-December/004720.html. On the other hand, you want a variable-like behaviour, as you describe at point 3 in http://www.ghostscript.com/pipermail/gs-code-review/2004-November/004701.html. These two demands are not compatible. With the sample file attached to comment #1, Distiller produces a PDF with brown drawings on both pages, and the variable-like method one with blue drawings on both pages. The differences are even better seen in the 3 PDF attached to http://www.ghostscript.com/pipermail/gs-code-review/2004-November/004705.html The source PS file is the one attached to comment #3. PDF_OBJECT_ASSIGN_NEW implements the variable-like behaviour, and dissapeared in favor of PDF_OBJECT_DISCARD_NEW because you insisted on complete Distiller compatibility. ** PLEASE make up your mind: Distiller-compat OR variable-like. ** I can "cut" from the "complete" patch that part that implements PDF_OBJECT_ASSIGN_NEW and remove PDF_OBJECT_DISCARD_NEW and PDF_OBJECT_CREATE_NEXT. I personally prefer PDF_OBJECT_ASSIGN_NEW, for many reasons. Yes, I agree it's much more intuitive from a programmer's point of view. And no, I don't think Adobe thought a lot on this problem before choosing the implementation in Distiller (this is only a personal opinion, of course). ------- > 4. In the Comment #9 you refer to ""User's perspective" in the > PDF_NAMECOLLISION_KEEPOLD section of comment #6;". I'm not sure that > we understand this subject in same way. IMO merging _equal_ objects > is a kind of optimization, which might be either delayed or > implemented with stream comparizon after a stream is written, as we > do for images. Lets delay it for now to simplify the patch. I mean > please don't include any effort for it now. > If it is important for you, we'll open another bug and work on it > separately later. There doesn't exist a strict subset of this patch that deals with the name collision in a Distiller-compatible way and does not provide the "optimization", so there's nothing to remove. Reason: at any point in time, we have A SINGLE Form XObject (so nothing to compare/ collapse/ etc). The optimization is merely a (desirable) side-affect of the way the code works. This is also true for PDF_OBJECT_ASSIGN_NEW. Maybe you are misleaded by the fact that sometimes pdf_exit_XObject() is supposed to "merge" something. That "merge" does not refer to collapsing 2 identical Form XObjects. When entering pdf_exit_XObject() with a negative value in pdev->accumulating_substream_resource->pres->object->id, a single Form XObject exists, but its component pieces are physically spread in 2 cos_object_t structures. The "merge" refers to gathering all these pieces into a single cos_object_t. These 2 cos_object_ts are: (1) the "main" one, linked from pdev->local_named_object, containing: - the correct object number; - the pdf_recource_t structure (including the "Rn" resource name); - eventual (key, value) pairs added with dict-/PUT pdfmarks issued since the /BP that started this XObject. (2) a temporary one, created by pdf_alloc_aside() called during pdf_enter_substream() and accesible only via pdev->accumulating_substream_resource, containing: - the stream-proper part (PDF ops generated from PS marking ops); - (key, value) pairs that deal with the stream part (filters...); - (key, value) pairs added by pdfmark_BP() itself (/BBox etc). ------- About the "as we do for images" part: - If that "collapsing" is done for UNNAMED images, then it has nothing to do with Form XObjects (which are NAMED); - If it involves named images too, then that's a bug. The reasons for both these statements are explaind here http://www.ghostscript.com/pipermail/gs-code-review/2004-December/004719.html ( It's that "lengthy article" you didn't have time for; start reading at "Comparing by contents in order to remove duplicates". Also consider this example: (image data) ... image % be it named or not showpage % end of page 1 [/_objdef {ImgNAMED} /NI pdfmark (SAME image data) ... image showpage % end of page 2 % later, add an alternate image for {ImgNAMED} [{ImgNAMED} << /Alternates ... >> /PUT pdfmark If you merge the 2 Image XObjects at the end of the 2nd invocation of the "image" operator, the alternate image will incorrectly affect the image on page 1 too. )
Well, we must be compatible with Distiller. Please cut out other branches. Discarding an object please generate a warning.
Created attachment 1185 [details] Modified "Bug687560 - PDF_OBJECT_DISCARD_NEW.diff" Here's the patch modified as requested. Changes: - Now printing a message when discarding an XObject definition. It's printed to stdout, not stderr, because it does not necessarily denote an error condition. Especially after the patch for Bug 687561 "Smaller PDFs when using execform", discarding a 2nd copy may actually be a good thing. - The request "Please cut out other branches" is a mystery for me. - PDF_OBJECT_CREATE_FIRST is the branch that creates the 1st (and the only, in most cases) XObject with a given name. Cannot remove. - PDF_OBJECT_CREATE_UNNAMED is the same as PDF_OBJECT_CREATE_FIRST except it deals with an unnamed object (start_XObject() is called from pdfmark_PS() too). It does not add the object to pdev->local_named_objects and does not dereference a NULL pointer as the current GS version does. Cannot remove it. - PDF_OBJECT_FROM_FWDREF handles the case when we already have a cos_object_t of type cos_type_generic created by a previous forward reference. Cannot remove either; with current GS version, this case produces an invalid PDF the same way as a duplicate objname does. However, I removed the error "case:"es and spread the "return"s around. I find having cases for error reporting clearer, but it seems a few more lines in the diff file are fatal. - Found and fixed a problem with the interaction between pdfmark_PS() ("P" the "S", not "S" then "P") and start_XObject(). The former assumed the cos_stream_t returned by the latter will be always around. But, that may be a temporary one, acting as a "waste basket", and is to be used only for writing the contents into. Now pdf_exit_XObject() returns a pointer to the correct cos_stream_t to be used for creating references to.
The paches attached to this bug report are affected by the changes for ps2wite made since these patches were posted (/.Global and pdev->accumulating_a_global_object). I did update my working copy; if interested in updated patches, place a comment here or contact me.
Yes, we're interested in an updated patch. > The request "Please cut out other branches" is a mystery for me. It means that branches, which are disabled with #if, should be removed.
> > The request "Please cut out other branches" is a mystery for me. > It means that branches, which are disabled with #if, should be removed. If you really mean "#if" (= conditional compilation): these NEVER EXISTED in this patch. If you mean "dead code"/ "branches not taken"/ "unused 'case:'es": at the time that request was made, these were ALREADY REMOVED (except a "default: /* should never happen... */ return_error(gs_error_unknownerror);", because there's a good habbit to have a "defalut:" in every "switch", plus some compilers complain when there isn't one). Before, such branches existed because there was no user-accesible method to choose between the 3 implemented alternatives. If a particular piece of the code seems to you to be unused, please point it out and I'll verify.
Created attachment 1672 [details] "Bug687560 - PDF_OBJECT_DISCARD_NEW.diff" updated for ps2write Patch from comment #15, modified to handle "/.Global" and "pdev->accumulating_substream-resources" that were recently added. Also fixed a comment that was still refering to an old identifier ("PDF_NAMECOLLISION_*" became "PDF_OBJECT_*"). Note: The sample file in comment #1 will display 2 empty pages with ps2write. See bug 688293 "opdfread: Wrong graphics state when painting a Form XObject" for details and a patch.
Patch to HEAD : http://ghostscript.com/pipermail/gs-cvs/2005-September/005712.html
I'm reopening this bug for the reasons explained, one binary month ago, in http://ghostscript.com/pipermail/gs-devel/2005-September/003241.html .
One more patch to HEAD : http://ghostscript.com/pipermail/gs-cvs/2005-October/005769.html
Dear SaGS, You wrote : > I was explicitly requested to supply a patch that implements > a Distiller-compatible behavior. Only supported Artifex customers are eligible to request something from Artifex. In such neccessity they should contact Artifex Support in the appropriate form. Free users are allowed to open bugs, which Artifex considers as an additional information for improving the product quality. A decision about performing corresponding improvements may be taken if the improvement is important for Artifex customers and if Artifex has enough resourced for doing it. In this bug those constraints are not satisfied.
To clarify things: > ... I WAS explicitly requested ... (please note the passive case) That is the reviewer of the patch requested that for me to comply, it's not me who requested this to the reviewer or Artifex. I only consider fair that since the reviewer requested this, any patch accepted for this bug fulfills that request. Additional note: from your commit log, I suspect we don't talk about the same problem...
Dear SaGS, You wrote : > To clarify things: > > > ... I WAS explicitly requested ... > (please note the passive case) Oops, I was not attentive reading it. I'm sorry. Please ignore last comment.
Hi SaGS, I'm reviewing some of the bugs in the tracker, and am trying to figure out what to do with this one. There's a lot of complex discussion, and it's not immediately clear to me what the benefit is in continuing work on this. I believe the patch in #22 should fix bug (C) in your #21 link. Do you agree? What I'd like to see is a clear, succinct description of the remaining problem, and why it's important (is it something we're likely to see only from your application, or more generally?). Given that, I'm amenable to accepting a patch, as long as it isn't too long or scary, that you feel resolves the bug. Thanks --Raph
Dear Raph, > I believe the patch in #22 should fix bug (C) in your #21 link. > Do you agree? I agree the 1st hunk (affecting pdfmark_PS()) solves "(G) 2 inherited bugs". The 2nd hunk (affecting pdfmark_EP()) looks pretty useless to me (because of gdevpdfr.c::pdf_objname_is_valid()). > What I'd like to see is a clear, succinct description of the > remaining problem, and why it's important (is it something we're > likely to see only from your application, or more generally?). What remains (from an exclusively technical point of view): Mainly, the bug itself. The commited patch does not solve all of it; only a part, and even that for the wrong reasons. I would like to point out the bug is about outputting invalid PDFs (missing objects and xref entries that point in the middle of nowhere), not about file size reduction. The latter comes as a free bonus with both the "Distiller-compatible" and "variable-like" implementations, but is blocked by the commited patch for which immediately writting each and every xobject is essential. Who benefits from my patch'es side-effect: - Anyone "printing" repetitive elements, either by including EPS files multiple times or by using PostScript forms (watermarks, logos, variable data printing, graphical bullets, ...). - Anyone redistilling PDFs that already contain Form XObjects (GS currently flattens them, bloating the output file). Example: real-world file shrinking 20-40x. Consider http://bugs.ghostscript.com/show_bug.cgi?id=688001#c1 (note: I have absolutely no connection with Objectif Lune Inc., or with the user that posted the file on comp.lang.postscript). It contains 4 EPSes, which it converts to PostScript forms and then paints them over and over, one of these with 2 different CTMs. - As-is, current HEAD with pdfwrite converts it to approx 16.5M PDF; - After (CAREFULLY) bracketing each EPS'es contents with BP-EP pdfmarks: - With the commited patch: size is 18.7M (xobjects add some overhead, duplicates not detected). With -dForOPDFRead I got a gigantic 109M PDF! (I think this comes from -dForOPDFRead including raster images into content streams, combined with not detecting duplicate xobjects.) - With my patch: size goes down to 454k, or 4.5M if -dForOPDFRead; the latter is caused by repeatedly including the same Type3 font, there are still only 4 Form XObjects. - The same result can be obtained without touching the file, but providing a suitable implementation of GS's execform (bug 687561 "Smaller PDFs when using execform", although I changed that code since then). This benefits everyone, automagically. Note: The "3. Doing (3) [(2)?], the CTM to be factored out." in http://bugs.ghostscript.com/show_bug.cgi?id=687561#c5 is particularily infeasible. > Given that, I'm amenable to accepting a patch, as long as it > isn't too long or scary, that you feel resolves the bug. Please see below for an over-simplified patch. Hope it's not scary. What I wanted was to fix pdfmarks that relate to Form XObjects myself, since I find these are way underused and it seems nobody else has the time or desire to do this. Sincerily yours, sags
Created attachment 1725 [details] Over-simplified patch. The simplification comes from some reduced functionality. However, the bug is fixed completely, and result is compatible with Distiller. The patch affects only gdevpdfm.c, and the diff is against -r1.48, that is without the 2 commited patches attached to this bug report. Compared to the patch in comment #19: - removed support for forward references, raising a rangecheck instead; - accepting the inconsistency described at (F) in the message linked from comment #21 above, since it's not so important and I got a considerably reduction of the code size; - removed the rather extensive comments. How it works: - When starting an XObject, a check is made if the object name is new; - If new, XObject creation is equivalent to the version before the patch; same if no name (possible for PostScript XObjects); - If the name is a duplicate, then the stream is discarded. Think of using pdf_substitute_resource(), except - the decision on whether to keep the object or not is done by start_XObject() based on object name (and remembered as pres->object->id == -1), not by pdf_find_same_resource() searching pdev->resources[] (this bug is about same NAME, not same CONTENTS). - pdfmark_PS() locates the matching object again by name, does not rely on the assumed pdf_substitute_resource() to return it (note: pdfmark_EP() does not need to find this object).
Here is a quote from "pdfmark Reference Manual November 30,2004" page 14 : [beg quote] That is,the {objname }can be in the argument list of a pdfmark construct before it is defined. [end quote] The suggested patch 1725 doesn't comply.
Due to the quote from Comment #28 "removed support for forward references, raising a rangecheck instead;" and Adobe quote from comment #29, I conclude that the patch 1725 causes a regression and must not be committed. Since this bug requires too many efforts form Artifex engineering resources, and since it has no demand from Artifex customers, I suggest to pay half bonus for investigations done and close the bug with 'WONTFIX".
Dear Leonardo (== Igor?), (1) I am perfectly aware that forward references are allowed and always said that, for more than a year already. This bug has a long history, much of it on gs-code-review. (2) The removal of support for forward references is NOT FROM Ghostscript, but FROM THE PATCH. (3) The patch in comment #19 "'Bug687560 - PDF_OBJECT_DISCARD_NEW.diff' updated for ps2write", which I consider the one to be really Distiller-compatible, DOES support forward references. If you want forward refs, apply that patch. (4) Previously, Ghostscript produced invalid PDFs if forward refs to XObjects were present, so strictly speaking I don't see where the regression is. This bug REMAINS after the patch you commited. (5) More than once, Igor advocated AGAINST accepting forward refs, and asked to reject them (which I didn't really agree with...) (6) Looking at your patch, I could only conclude that you don't consider this functionality important at all, at least not so important as a smaller diff. So, I modified my patch by removing the code for it. You only have to make up your mind about forward refs.
Hi SaGS, Thanks for your patient hard work trying to get this problem resolved. I think there have been some miscommunications, which hopefully we can resolve and then get this closed. 1. The patch in #22 is not the only one that's been committed in this area. Are you sure that HEAD is still generating incorrect PDF? Does the file in comment #1 trigger this behavior? If not, can you post a new test case that does? 2. I'm still confused by the forward reference issue. I think we may have mistakenly believed it was OK to reject forward references in the past, but now I think everyone is in agreement that we must handle them. From your point (2) in comment #31, I _think_ you're saying that HEAD + your patch in #28 will still process inputs containing forward references as well as HEAD without the patch. Is this what you are asserting? Can you write a short test file that exercises this behavior? Obviously, I would be unhappy if such a file were to run correctly through HEAD but cause a rangecheck after applying #28. Thanks again.
Created attachment 1731 [details] Sample for other 2 incarnations of the bug. Dear Raph, > ... > 1. The patch in #22 is not the only one that's been committed in > this area. Are you sure that HEAD is still generating incorrect > PDF? Does the file in comment #1 trigger this behavior? If not, > can you post a new test case that does? The sample in comment #1 does not contain forward references, so it does not trigger this incarnation of the bug. I attach a ZIP file containing a simple PostScript file and the PDF produced from it by current HEAD. It illustrates the 2 incarnations of the bug I mentioned at point (A) in my recent gs-devel post (http://ghostscript.com/pipermail/gs-devel/2005-September/003241.html) which remain present in current HEAD, including the one concerning forward references. In the output PDF, take a look at the 2 custom keys added to the {Catalog}. The objects they point to don't exist in the PDF (obj #4 for /PointerToLostFwdReferencedXObj and #9 for /PointerToLostArray). The xref entries for them are not marked as deleted and point somewhere at random (the one for #4 points, by hazard, to object #6 and the one for #9 points somewhere inside #6). > 2. ... > From your point (2) in comment #31, I _think_ you're saying that > HEAD + your patch in #28 will still process inputs containing > forward references as well as HEAD without the patch. Yes (but, to be exact, I'm refering to HEAD without commited patch in comment #20 and comment #22, but with my patch in comment #28, not TODAY's [2005/10/26] HEAD + comment #28). > ... > Can you write a short test file that exercises this behavior? For testing, please use the *.ps file in the archive, even better cut in 2 for the 2 parts. With my patch, part 1 (forward reference to xobj) terminates with a rangecheck. Part 2 (redefining a preexistent object that is not an xobject) terminates with a typecheck, because pdfmark refuses to change the PDF type of an object (was a PDF array, trying to redefine as a PDF stream).
The supplied PS file 1731 is incorrect. An Adobe log follows. Acrobat Distiller 6.0 Started: 2 íîÿáðÿ 2005 ã. at 20:42 Adobe PostScript software version: 3015.102 Start Time: 2 íîÿáðÿ 2005 ã. at 20:42 Source: Bug687560-#a.ps Destination: F:\AFPL\MLC2\PROB574-687560\Bug687560-#a.pdf Adobe PDF Settings: G:\Documents and Settings\All Users\Documents\Adobe PDF 6.0 \Settings\Standard.joboptions %%[ Error: typecheck; OffendingCommand: pdfmark; ErrorInfo: SP LostArray ]%% Stack: /EP -dict- %%[ Flushing: rest of job (to end-of-file) will be ignored ]%% %%[ Warning: PostScript error. No PDF file produced. ] %% Distill Time: 0 seconds (00:00:00) **** End of Job ****
(1) So, from Bug687560-#a.ps one gets: - An invalid PDF, with no indication that something went wrong, from current HEAD; - A "typecheck" from Distiller 6; - The same "typecheck" from GS with either my patch from comment #28 "Over-simplified patch." (version without fwd refs support) or the one from comment #19 ""Bug687560 - PDF_OBJECT_DISCARD_NEW.diff" updated for ps2write" (version with fwd refs). For the former, one obviously needs to remove part 1 from the file, because it is about fwd refs that the patch does not support (would raise a "rangecheck"). I would conclude the test is meaningfull, not faulty. Also please note that the part where Distiller complains has nothing to do with fwd refs. Both my patches do some "sanity checks" (inspired from verifications done by pdfmark_SP(), plus some common sense), and it's such a check that leads to the "typecheck", not something related to fwd refs. (2) As is, the file responds very well (with a "yes") to the question whether parts of the bug remained in current HEAD or not, including the one about forward references. (3) To further examine each of the 2 incarnations of the bug that remained in HEAD, separate the 2 parts of the file as I already mentioned in the last paragraph of comment #33. The 2 parts are independent, but I put them in a single file because I hate having too many files laying around.
Please provide a correct Postscript. We do not care about PDFs generated from incorrect Postscript.
Created attachment 1746 [details] Bug687560-#a-part1.ps Part of the sample that concerns forward references to Form XObjects. Note: The other part does not contain anything contrary to PLRM or the pdfmark Reference Manual, it's just Distiller messing things, but that's not so important now.
Because forward references came again into discussion: it seems after the 3rd rewrite of the patch I didn't test these enough. The patch in comment #19 "'Bug687560 - PDF_OBJECT_DISCARD_NEW.diff' updated for ps2write" does not set correctly some flags when fwd refs are found, the most nasty consequence being a double-free during pdf_close(). I appologize for this mistake, now it's fixed (3 lines of code) and I tested the whole thing again. If you arrive at looking into this, I'll post the changes.
Dear SaGS, You may submit the improved patch for archiving purpose. In same time it would be better if you concentrate on creating correct Postscript samples, which demonstrate problems you mention. Thank you.
Created attachment 1761 [details] Test pack. Here is a pack of PS files for testing, also for proving the remarks I made in http://ghostscript.com/pipermail/gs-devel/2005-September/003241.html. I also included logfiles and PDFs generated by: - Acrobat Distiller 5.0/ Windows (file names include "Distiller 5,00"), - Jaws PDF Creator 2.11 ("JawsPDF 2,11") - just to see "another opinion" on the subject, - Ghostscript recent HEAD ("Ghostscript HEAD"), and - Ghostscript with a corrected version of my patch from comment #19 ("PDF_OBJECT_DISCARD_NEW") instead of the patches in comment #20 and comment #22.; I'll attach this version in a following comment. I consider the Distiller version to take as a reference is 5, since Ghostscript presents itself as compatible with this one; pdfwrite produces PDFs version <= 1.4, also see "currentdistillerparams /CoreDistVersion get ==" (with pdfwrite as current device, of course). I would take into consideration the behaviour of other Distiller versions, as long as it represents a progression, not a regression. A few words about each test follow. First the ones for (A)..(F) in my gs_devel post. Then a file for testing all combinations of forward ref + definition + redefiniton + "normal use" for both Form XObjects and PS XObjects, to be sure the patch fixes what it's expected to fix and does not break anything. The problem about some flags mentioned in comment #38 is simply a coding error in my patch, now fixed, so there's no test for it. Bug687560-#a-part1.ps --- - for "(A) Output PDF still invalid" from my gs-devel post, firt dash (behaviour regarding a forward reference); - Distiller, Jaws PDF Creator and my patch: OK; - GS HEAD: invalid PDF: object #4 referenced but missing from the PDF, its xref entry points at random (here, to obj #6). Bug687560-#a-part2.ps --- - same "(A) Output PDF still invalid", 2nd dash (behaviour in the presence of a pre-existing object w/ same name); - Distiller and Jaws PDF Creator: OK; - GS HEAD: invalid PDF, same object #4 referenced but missing, its xref entry happens to point to obj #6. - My patch: I considered changing the PDF type of an object to be an abuse; some "sanity checks", based on tests currently done by pdfmark_SP(), reject this operation with a typecheck. - I don't have a strong preference about what should happen in this case. Bug687560-#b.ps --- - for "(B) Incompatible with Distiller"; creates 2 Form XObjects with same name and creates references to them, to check to which instance each reference is resolved to. - Distiller and my patch: all refs are resolved to the 1st copy, the PDF displays 2 green lines. - Jaws PDF Creator: different behaviour, refs are resolved to the copy created last - 2 red lines; "variable-like" behaviour that corresponds to PDF_OBJECT_ASSIGN_NEW from comment #8. - GS HEAD: different than Distiller, refs resolved to the most recently created copy; corresponds to PDF_OBJECT_CREATE_NEXT from comment #8. Bug687560-#c-part1.ps & Bug687560-#c-part2.ps --- - for "(C) Introduces a new bug: can never /PUT into Form XObjects"; - "part 1" tries to /PUT into the XObject during its creation (between /BP and /SP), "part 2" does this after /EP; - Distiller and my patch: both succeed. - Jaws PDF Creator: limited functionality, accepts "part 1" but not "part 2". - GS HEAD: both fail, but in diffret ways. "Part 1" gives an undefined, while "part 2" apparently succeeds but the added key is nowhere to be found in the PDF. (There's also a bad xref in "part 1".) Bug687560-#d.ps --- - for "(D) ... No, it never detects equal [Form X]objects". - the test creates 2 identical Form XObject, with identical CTMs. - GS HEAD: creates 2 identical objects (#7 and #8 in "Bug687560-#d - Ghostscript HEAD.pdf"), contrary to the commit message claiming in this case duplicates are detected and "collapsed". About "(E) pdf_find_same_resource(): not for named objects" --- - no sample file here, just look at gdevpdfj.c -r1.49, lin 492..493, function pdf_end_write_image(), which avoids passing a named object to pdf_substitute_resource() for a good reason. The comment inthere is true, but a bit misleading since the object is not written there anyway (bool write == false). - Note: I qualified this as a POTENTIAL bug, not one with immediate adverse consequences. I'm 99% sure where you want to arrive with this patch; while I consider it a valuable addition for another bug currently in the tracker, - it has nothing to do with the present bug, it does some harm without truly solving the current bug; - for the other bug, the patch is far from being complete; - if used for the other bug, this "not for named resources" problem, unless fixed, WILL have adverse effects. Bug687560-#f.ps --- - for "(F) Inconsistency if self-referencing Form XObjects" - I consider this detail absolutely minor, and some people might even consider the HEAD behaviour as to be expected. - No Distiller-produced file here, because it crashes with an access violation (DrWatson shows an indirect call through a null pointer), so a test file triggers a bug in Distiller. - GS HEAD exhibits different behaviours for the 2 self-references. - Jaws PDF and my patch: same result for both self-refs. Bug687560-tests.ps --- - More complete test file, exercices all combinations of forward ref and/or "normal use" and/or redefinition; the redefiniton can appear before and/or after the XObject being used. - Details provided in the comments. If more needed, ask. - Both Form and PS XObjects are tested, also a test for unnamed /PS pdfmark (since all of these share code that is modified by the patch). - For Distiller, the named PostScript pdfmark tests are skipped; Distiller has trouble with some of these, and I have neither the means, nor the time or desire to debug Distiller. - While I'm not interested in /PS pdfmarks, I included tests for them to be sure nothing gets broken [more than it already was]. - Distiller (/PS tests ommited), Jaws PDF and my patch: OK. - GS HEAD: invalid PDF, there are about 20 missing objects; these objects' xref entries point to, or inside, other objects. There are 3 different reasons for this, all having to do with the commited patch; 2 of them are inherited from the GS version before the patch, 1 is newly introduced (I haven't mentioned this one until now).
Created attachment 1762 [details] Replacement "Bug687560 - PDF_OBJECT_DISCARD_NEW.diff". The one in comment #19 had problems with handling forward refs, as mentioned in comment #38. Also tweaked the condition for setting /.Global for unnamed PS XObjects: these don't need to be /.Global, unless the current substream is already /.Global. Relative to current HEAD minus patches in comment #20 and it's fix from comment #22; a diff relative to today's HEAD would be hard to read. The first hunk for gdevpdfm.c is not part of the current patch, but it appears because other patches affected this file since -r1.48, and my working copy takes these into account. Not "for archiving".
Patch to HEAD : http://ghostscript.com/pipermail/gs-cvs/2005-November/005798.html This patch fixes important problems : generating an incorrect PDF. Some problems about different behavior from Adobe Distiller left unfixed due to old basic constraints in Ghostscript. See the log message of the patch for more details. We believe that those differences are not important for our customers, and smart users can work around them in their applications. Closing the bug now.