Bug 687560 - Invalid PDF if /BP pdfmarks with non-unique /_objdef
Summary: Invalid PDF if /BP pdfmarks with non-unique /_objdef
Status: NOTIFIED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: master
Hardware: PC All
: P3 normal
Assignee: leonardo
URL:
Keywords: bountiable
Depends on:
Blocks:
 
Reported: 2004-07-05 10:28 UTC by SaGS
Modified: 2008-12-19 08:31 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
Short file to reproduce the problem. (1.45 KB, application/postscript)
2004-07-05 10:38 UTC, SaGS
Details
Suggested patch. Relative to GS8.30 sources. (6.08 KB, patch)
2004-07-05 10:39 UTC, SaGS
Details | Diff
An usefull abuse. Auto-numbering pages, with a twist. (1.02 KB, application/postscript)
2004-07-05 10:40 UTC, SaGS
Details
Suggested patch, revised and extended. (22.85 KB, patch)
2004-08-29 14:18 UTC, SaGS
Details | Diff
"Bug687560.diff" - for archiving (22.71 KB, patch)
2005-01-19 13:17 UTC, SaGS
Details | Diff
"Bug687560 - PDF_OBJECT_DISCARD_NEW.diff" (14.23 KB, patch)
2005-01-19 13:18 UTC, SaGS
Details | Diff
"Bug687560 - Distiller 4,05.pdf" (2.21 KB, application/pdf)
2005-01-19 13:19 UTC, SaGS
Details
"Bug687560 - PDF_OBJECT_DISCARD_NEW.pdf" (2.96 KB, application/pdf)
2005-01-19 13:20 UTC, SaGS
Details
Modified "Bug687560 - PDF_OBJECT_DISCARD_NEW.diff" (15.36 KB, patch)
2005-01-30 05:47 UTC, SaGS
Details | Diff
"Bug687560 - PDF_OBJECT_DISCARD_NEW.diff" updated for ps2write (15.49 KB, patch)
2005-09-03 12:10 UTC, SaGS
Details | Diff
Over-simplified patch. (6.26 KB, patch)
2005-10-23 15:05 UTC, SaGS
Details | Diff
Sample for other 2 incarnations of the bug. (1.33 KB, application/x-zip-compressed)
2005-10-26 13:43 UTC, SaGS
Details
Bug687560-#a-part1.ps (509 bytes, application/postscript)
2005-11-03 00:44 UTC, SaGS
Details
Test pack. (65.63 KB, application/x-zip-compressed)
2005-11-11 14:57 UTC, SaGS
Details
Replacement "Bug687560 - PDF_OBJECT_DISCARD_NEW.diff". (16.04 KB, patch)
2005-11-11 14:58 UTC, SaGS
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description SaGS 2004-07-05 10:28:49 UTC
If passing the same object name (the value corresponding to the 
/_objdef key) to 2 or more /BP pdfmarks, the PDF file produced 
is invalid. For each /BP call, a PDF object number is alocated, 
and (at /SP time) that copy is referenced from a /Resources 
dictionary. But, except the last copy, those objects are not 
written to the output file and the corresponding xref entries are 
invalid (neither marked as free, nor pointing to valid objects).

Reusing object names passed to pdfmark seems to be neither 
explicitely allowed nor explicitely disallowed, but both Acrobat 
Distiller and Jaws PDf Creator accept this.

The attached patch fixes this as follows:

- pdfmark_BP() explicitely verifies if the object name is a 
  duplicate; if not, it works as before (and there will be no 
  change in the generated PDF files compared to existing version).
- if a name is reused, reads the old objects's # (plus some other 
  info) for reusing it with the object that will be created.
- the old copy of the object is freed during the call to 
  pdf_enter_substream() (more precisely when the new (name, object) 
  pair is added to pdev->local_named_objects), so even if the 
  object # is reused, there won't be 2 PDF objects with the same # 
  written to the output PDF. There's no change here relative to 
  existing code.
- the new object is created/written normally, the only difference 
  being the object # which is not newly allocated.

-------

With the proposed patch, the "same name" is treated as a hint the 
Form XObjects are the same and only one copy is written to the 
output PDF. This is an enhancement, allowing to considerably reduce 
the PDF file size when using logos, watermarks, and other repeated 
elements. When the obj name is the same but the marking operations 
are different it does not produce the result someone may expect, 
but I think this case should be considered an abuse.

Such an optimization cannot be done by other means. Currently, GS 
does not detect duplicate objects. Even if it tried to compare them, 
duplicates that are translated/ rotated/ etc would not be detected, 
because the coordinates written in the PDF file are in PostScript 
device (not user) space, making the content streams appear to be 
different.

Here is how this enhancement can be put to work:

Developers generating their own PostScript code -------

This is the most obvious use: enclose the repeated code between a 
/BP pdfmark call and /EP+/SP calls, using the same name for all 
copies of the same logo/ watermark/ etc (and different names for 
different images, of course). A single question remains: why on 
earth would anyone duplicate the PostScript code that actually 
paints marks on the page, when starting with the 2nd occurence a 
single /SP call, which has to be included anyway, will suffice? 
(and, wouldn't this also make the PostScript file unnecessarily 
large?) Yes, the PostScript file is larger, but the SAME file is 
suitable both for printing and for converting to PDF. When 
printing, an /SP pdfmark only (for all occurences except the 1st) 
is not sufficient because the logo/ etc would appear only once. 
When converting to PDF, the pdfmarks and their special processing 
introduced with this patch are necessary in order to get a 
(sometimes much) smaller PDF. No need for special-casing printing 
versus conversion to PDF; and no unpleasant surprises if the user 
reuses the PosctScript file for a different purpose (print/ 
convert) that it was intended for. 

EPS files -------

A simple change to an EPS file allows it to automatically convert 
to a Form XObject - only one copy of the XObject will be generated 
for a PDF file, even if the EPS is included multiple times.

The change is as follows:

- at the beginning of the EPS, before any existing PostScript code, 
  include something similar to the following:

        userdict /EPSX::status
            /pdfmark where {
                pop
                [
                    /_objdef {THISEPSOBJNAME}
                    /BBox    [LLX LLY URX URY]
                /BP pdfmark
                gstate
            }{
                null
            } ifelse
        put

  THISEPSOBJNAME is a name for the created Form XObject, unique for 
  this EPS. I find that using the EPS file's name is a good idea.
  LLX LLY URX URY are the coordinates of the EPS's bounding box, 
  taken directly form the %%BoundingBox line.

- at the end of the file, after any existing PostScript code, add 
  something similar to the following:

        userdict /EPSX::status get dup null eq {
            pop
        }{
            [/EP pdfmark
            gsave
                setgstate
                [{THISEPSOBJNAME} /SP pdfmark
            grestore
        } ifelse

  THISEPSOBJNAME is the same name as used at beginning of the file.

No special tools or tricks are needed to place such an EPS file 
on the page.

EPS files revisited -------

I'm 99.99% sure that detection of embeded EPS files and insertion 
of pdfmarks as shown above can be done automatically, using the 
/ParseDSCComment user parameter. (The missing 0.01% come from not 
having done it yet...)

PostScript Forms -------

At last, but not the least, this enhancement can be extended to 
PostScript forms. See 687561 "Smaller PDFs when using execform" 
(which is not a bug, only an enhancement).
Comment 1 SaGS 2004-07-05 10:38:33 UTC
Created attachment 788 [details]
Short file to reproduce the problem.
Comment 2 SaGS 2004-07-05 10:39:19 UTC
Created attachment 789 [details]
Suggested patch. Relative to GS8.30 sources.

Includes fixes for the typos mentioned in bug 687430 "Wrong 
transformation matrix with Form XObjects", comment #3.
Comment 3 SaGS 2004-07-05 10:40:00 UTC
Created attachment 790 [details]
An usefull abuse. Auto-numbering pages, with a twist.
Comment 4 Igor Melichev 2004-07-13 04:50:35 UTC
I don't approve the patch due to the trick with pdf_set_xref. From a I brief 
review I conclude that it will create dangled (unreferred) objects in the 
output PDF. Also I don't like to change pdf_enter_substream, because there 
exist a related priority problem to provide a consecutive file writing in all 
circumstances, but currently OrderResources=true requires a positionability.

Adobe pdfmark Reference Manual allows multiple appearances of the object name 
in objdef, but they to be concerned as a reference to the existing object. 
Since the object stream is allowed to appear after the object is defined, we 
must delay the object writing untill that event. So objdef must just create a 
cos dict without writing to PDF, as we do for ExtGState, and delay the writing 
untill either the source document end, or the stream appears. If another stream 
appears for the same object, report an error.

Dear SaGS, I'd appreciate if you make a better patch. Unfortunately now we have 
no time for this problem. Sorry for that, and thank you.
Comment 5 Ralph Giles 2004-07-14 10:30:25 UTC
downgrading priority
Comment 6 SaGS 2004-08-29 14:16:57 UTC
FIRST, A FEW CLARIFICATIONS ABOUT THE PREVIOUS PATCH

> "the trick with pdf_set_xref"
I think there's a misunderstanding here, mainly because of an 
unfortunate choice for the name and the comment. pdf_set_xref() 
does nothing more than pdf_ref_obj(), which is normally called. 
The only difference between the 2 is how they get the object 
number: pdf_ref_obj() calls pdf_next_id() to allocate a new 
object number, while pdf_set_xref() receives the number as a 
parameter. It does not write to the output PDF, but to a 
temporary file holding the (temporary) xref.


> "it will create dangled (unreferred) objects in the 
> output PDF"

Certainly not. Internaly, there are multiple object created with 
the same number. However, eact time the number is reused, the old 
object is freed. As I already mentioned, this happens when the 
new copy is added to pdev->local_named_objects and nothing 
changed here relative to the version before the patch. In fact, 
deletion of the previous copy is the cause of the bug in the 
first place, because it happened in a harmfull way.

Unreferrenced objects could appear if the "old copy" were already 
written to the output PDF. However, named objects are written 
only by /NamespacePop and at end of job, and in both cases the 
written objects are removed from pdev->local_named_objects so 
pdfmark_BP() will never find them as "old copies".


> "as we do for ExtGState"

If I understood correctly, that method is as follows:
- start by creating a "ghost" object without a number
- fill-in the object
- when about to end the creation of the object, decide if it
  is to be preserved or not; this can be done, for example, by 
  comparing it to existing objects in order to collapse 
  identical objects.
- if it has to be preserved, assing it an ID (= a number); 
  otherwise, simply discard the "ghost".

This works perfectly with UNNAMED objects. However, it dooes not 
work so well with NAMED objects. First, it cannot handle forward 
references and the generic objects created when these are 
encountered, because this method always creates NEW objects, 
never REPLACES nor "CONTINUES" existing ones. Secondly, objects 
create by /OBJ and /BP may be referenced WHILE they are created; 
not having a number during this time, this is made impossible. 
The worst case is with /OBJ, because creating those objects ends 
only at end of job; this method makes ALL references to them, 
which have to appear DURING the job, impossible.


> "If another stream appears for the same object, 
> report an error."

Do you suggest NOT to accept creating 2 objects with the same 
distill-time name?
(1) Other PDF converters, including Acrobat Distiller, do accept 
    this case.
(2) There's a much bigger fish to catch here, see the second 
    part of comment #1 and the companion bug 687561 "Smaller 
    PDFs when using execform". I do not really expect someone 
    to write GS-specific code in order to get optimized PDFs by 
    exploiting the "same name = same object" policy introduced 
    by this patch (although, in a controlled environment, this 
    is possible). But GS can use this internally, as mentioned 
    in "EPS files revisited" above and in bug 687561.
(3) A practical point of view: getting NO PDF instead of a 
    DAMAGED PDF does not help get the work done.

-----------------------------------------------------------------
NEXT, WHAT'S DIFFERENT WITH THE NEW PATCH

- It implements 3 different ways to deal with name collisions, 
  denoted in code by PDF_NAMECOLLISION_REBUILD,
  PDF_NAMECOLLISION_KEEPOLD ("as we do for ExtGState"-alike) 
  and PDF_NAMECOLLISION_BACKUP. Details on these later. Not all 
  of them are currently used, but these are provided for future 
  developements.

- There was a bug in the old patch in the way the "resource" 
  structure was (mis-)handled. It was possible to get /Resource 
  dictionaries with duplicate keys, like in
      /Resource <<
          /XObject <<
              /R7 7 0 R
              /R7 7 0 R % note: same key (and ref to same object)
          >>
      >>

- /BP and /SP are able to handle the generic objects created 
  when forward references were encountered. (The change to /SP 
  is not strictly connected to this bug.)

- A previous patch (the one that introduced start_XObject()) 
  made this bug appear with /PS pdfmarks too. The fix is the 
  same as for /BP.

- That same patch has a small error: pdfmark_PS() may call 
  start_XObject() with objname == NULL, in which case "objname" 
  must not be dereferenced (as in "objname->data/size") and no 
  attempt must be made to add the object to 
  pdev->local_named_objects (because it's NOT NAMED).

- It's against the current (at the time of this writing) HEAD.

-----------------------------------------------------------------
SOME DETAILS ABOUT THE NEW PATCH

Most of the changes are in start_XObject(). Its code is 
structured in 3 parts:
(1) name-conflict detection and specification of applicable 
    methods in "oper" - the 1st 'switch",
(2) restrictions coming from options/etc and managing priorities,
(3) implementation of the various methods - the 2nd, big "switch"

If specific GS options are not compatible with one or the other 
of the methods, or for changing priorities when more that 1 
method could be used, it's easy to chage part (2).

There are 3 main methods to deal with "true" name collisions, 
denoted by PDF_NAMECOLLISION_REBUILD, PDF_NAMECOLLISION_KEEPOLD, 
and PDF_NAMECOLLISION_BACKUP. There's a special method to 
efficiently handle generic objects created by forward references, 
denoted by PDF_NAMECOLLISION_FWDOBJ; this is not a real name 
collision, but appears as such because generic named objects are 
created and inserted into pdev->local_named_objects.
PDF_NAMECOLLISION_NONE and PDF_NAMECOLLISION_NONAME are for "no 
name collision"  (the 2nd being handles unnamed objects).
PDF_NAMECOLLISION_ERRCODE, PDF_NAMECOLLISION_ERRTYPE, and 
PDF_NAMECOLLISION_ERRRANGE are for returning errors.

There is some similar code appearing in 3 of the cases; I 
prefered not to factor it out, because doing so would make it 
hard to read (too many "if"s).

Details on the various methods follow:


PDF_NAMECOLLISION_REBUILD -------

User's perspective:
  All references to a particular object name are resolved to the 
  LAST among all objects defined with that name. Only this last 
  copy is written to the output PDF, resulting in an output file 
  as small as it can be.

Compatibility: Jaws PDF Creator 2.11

Implementation:
  To avoid extra changes to pdf_enter_substream(), this case is 
  reduced to PDF_NAMECOLLISION_FWDOBJ - a generic object is 
  created as if encountering a forward reference. The "resource" 
  and the object number are reused. The old copy is deleted 
  during the call to pdf_create_named() (more precisely when the 
  new copy is inserted into pdev->local_named_objects), so it is 
  not written to the output PDF.

Necessary changes (outside gdevpdfm.c::start_XObject):
  - pdf_obj_ref_reused(), which is similar to pdf_obj_ref() 
    except it gets the object number as a parameter, instead 
    of getting it from pdf_next_id();
  - all changes for PDF_NAMECOLLISION_FWDOBJ.


PDF_NAMECOLLISION_KEEPOLD -------

User's perspective:
  All references to a particular object name are resolved to the 
  FIRST one among all objects defined with that name. Only this 
  first copy is written to the output PDF, resulting in an 
  output file as small as it can be.

Compatibility: Acrobat Distiller 4 (and 5 too, I think)

Implementation:
  Uses pdf_enter_substream() with reserve_object_id == false to 
  create a "ghost" object without an ID. At /EP time, this 
  object is discarded. Because of the object not having an ID, 
  this method has a weakness. Consider the following code:

      [/_objdef {ANAME} .. /BP pdfmark
          ... building first copy ...
      [/EP pdfmark
      [/_objdef {ANAME} .. /BP pdfmark
          ... building second copy ...
          (second) show
          [{ANAME} ... /PUT pdfmark
      [/EP pdfmark

  Content added during "... building second copy ..." using 
  PostScript marking operators (like "show" in this example) 
  goes into the 2nd copy and ends up being discarded; content 
  added using pdfmarks (like a /Group entry to control 
  transparency) goes to the 1st copy and is written to the 
  output PDF, which is inconsistent at least. Since the 2nd copy 
  does not have a number, it cannot be referenced so it MUST NOT 
  be added to pdev->local_named_objects, and, as a consequence, 
  it cannot be found by /PUT. This weakness is intrinsic to 
  pdf_enter_substream() with reserve_object_id == false, and is 
  the reason I didn't favor this method.

Necessary changes (outside gdevpdfm.c::start_XObject):
  - pdfmark_EP() has to discard the "ghost" object.


PDF_NAMECOLLISION_BACKUP -------

User's perspective:
  All objects, even those having the same name, are preserved 
  and written to the PDF file, as if they had unique names. Each 
  reference is resolved to the most recent copy of the object; 
  "most recent" refers to the execution order, not source code 
  order. I didn't favor this method because it doesn't help 
  reduce the ouput file's size.

Compatibility: Jaws PDF Creator 3.3

Implementation:
  "Backup" previous copy by dinamically renaming it, then create 
  a new object normally.

Necessary changes (outside gdevpdfm.c::start_XObject):
  - pdf_unique_objname() to generate unique names;
  - cos_dict_rename() to "rename" pdev->local_name_objects 
    entries.


PDF_NAMECOLLISION_FWDOBJ -------
False name collision due to a forward reference.

Implementation:
  pdf_alloc_aside() "continues" with the generic object from 
  where the call to pdf_refer_named() from pdfmark_next_object() 
  left off; this object is passed from start_XObject().

Necessary changes (outside gdevpdfm.c::start_XObject):
  - The pcofwd parameter to pdf_enter_substream(), 
    pdf_open_aside() and pdf_alloc_aside(), which points to the 
    generic object created when the forward reference was 
    encountered. Only pdf_alloc_aside() processes it, the others 
    only pass it along. Note that pdf_alloc_aside() deals with 
    generic object WITH or WITHOUT an associated "resource" 
    structure; those WITH a "resource" are created by 
    pdfmark_SP() when it has to "render" a Form XObject that has 
    not yet been built (a forward reference).


PDF_NAMECOLLISION_NONE -------

No name collision. Works as it did before the patch.


PDF_NAMECOLLISION_NONAME -------

Not a named object, so no collision possible. Same as 
PDF_NAMECOLLISION_NONE, except the object is not registered into 
pdev->local_named_objects.


PDF_NAMECOLLISION_ERRCODE -------
PDF_NAMECOLLISION_ERRTYPE
PDF_NAMECOLLISION_ERRRANGE

Error cases. Return "code", gs_error_typecheck and 
gs_error_rangecheck respectively.


-----------------------------------------------------------------
FINALLY, apologies for spelling errors/etc. My English is not 
really good...
Comment 7 SaGS 2004-08-29 14:18:29 UTC
Created attachment 868 [details]
Suggested patch, revised and extended.
Comment 8 SaGS 2005-01-19 13:17:22 UTC
Created attachment 1160 [details]
"Bug687560.diff" - for archiving

Only for archiving, see next comment for a reduced patch. May return to 
it later, for example when discussing imposition.

The perceived functionality (see comment #6 above, the "User's 
perspective" sections) is same as the previous one, but:

  - implementation details changed, the main purpose being to avoid 
    modifying pointers set by pdf_enter_substream() (the "surgery");

  - names for the PDF_NAMECOLLISION_* constants changed as follows, 
    hope this time they are clear enough:
	PDF_NAMECOLLISION_NONAME   -> PDF_OBJECT_CREATE_UNNAMED
	PDF_NAMECOLLISION_NONE	   -> PDF_OBJECT_CREATE_FIRST
	PDF_NAMECOLLISION_ERRCODE  -> PDF_OBJECT_ERRCODE
	PDF_NAMECOLLISION_ERRTYPE  -> PDF_OBJECT_ERRTYPE
	PDF_NAMECOLLISION_ERRRANGE -> PDF_OBJECT_ERRRANGE
	PDF_NAMECOLLISION_REBUILD  -> PDF_OBJECT_ASSIGN_NEW
	PDF_NAMECOLLISION_KEEPOLD  -> PDF_OBJECT_DISCARD_NEW
	PDF_NAMECOLLISION_BACKUP   -> PDF_OBJECT_CREATE_NEXT
	PDF_NAMECOLLISION_FWDOBJ   -> PDF_OBJECT_FROM_FWDREF

  - it solves the anomaly with PDF_OBJECT_DISCARD_NEW (ex- 
    PDF_NAMECOLLISION_KEEPOLD) described above (search for "weakness"
    in comment #6); this is the use of cos_object_t.is_writable, I'll 
    post details only if necessary.

  - some other minor changes, especially reorganizing some of the code 
    for more clarity.
Comment 9 SaGS 2005-01-19 13:18:15 UTC
Created attachment 1161 [details]
"Bug687560 - PDF_OBJECT_DISCARD_NEW.diff"

Proposed patch, including only the parts that are necessary for 101%
Distiller compatibility.

The 1% above full compatibility comes from reproducing the following 
Distiller anomalies:

  - the "weakness" mentioned in comment #6;
  - /SP does not accept forward references (= object names that did not 
    yet appeared in a /BP).

User's perspective -------
  See comment #6, "User's prespective" for PDF_NAMECOLLISION_KEEPOLD.

Backward compatibility -------

  There is absolutely no change for cases when the current GS version 
  produces valid output (PDF_OBJECT_CREATE_UNNAMED and 
  PDF_OBJECT_CREATE_FIRST cases correspond to the old code).

  When an {OBJNAME} appears for the 2nd/etc time in a /BP (branch 
  PDF_OBJECT_DISCARD_NEW), or appears in /BP 1st time but after being 
  used as a "forward reference" (PDF_OBJECT_FROM_FWDREF), existing 
  GhostScript version outputs an invalid PDF, so no one successfully 
  used files that are in these situations. This is true independent 
  of how pdfmarks get called: from a single PostScript file, from 
  2 or more PS files concatenated or otherwise combined, or from PDF 
  files (because PDF file = a PS file [the PDF interpreter which is 
  written in PostScript] which uses an external data file [the PDF]). 

  These considerations are also valid for the "archived" patch 
  attached to comment #8.

Implementation -------

  Dealing with unnamed objects (PDF_OBJECT_CREATE_UNNAMED branch): no 
  change relative to current GhostScript version, except fixing an 
  error (for details search for "start_XObject() with objname == NULL" 
  in comment #6).

  First use of an {OBJNAME} in /BP (PDF_OBJECT_CREATE_FIRST branch): 
  works EXACTLY as the current GhostScript version does (and as all 
  other versions of this patch did).

  Error cases (PDF_OBJECT_ERR*): nothing to comment.

  PDF_OBJECT_ASSIGN_NEW (ex- PDF_NAMECOLLISION_REBUILD) and
  PDF_OBJECT_CREATE_NEXT (ex- PDF_NAMECOLLISION_BACKUP): these are 
  alternatives to PDF_OBJECT_DISCARD_NEW, with different perceived 
  behaviors, and are removed them from this reduced patch.

  Duplicate use of an {OBJNAME} in /BP (PDF_OBJECT_DISCARD_NEW):
  - for the perceived behavior, see "User's perspective" in the 
    PDF_NAMECOLLISION_KEEPOLD section of comment #6; this is the 
    way Distiller deals with duplicates object names in /BP;
  - the "weakness" mentioned there is present in Distiller and 
    this patch reproduces it;
  - see comments in code for implementation details;
  - one note though: pres->object->id for the temporary stream is 
    used as temporary storage to allow pdfmark_EP() to find the 
    "main" copy of the object. This field is otherwise not used 
    between /BP and /EP (the stream is not accessible except for 
    writing PDF code generated by PostScript marking operators, in 
    particular no references can be created to it);

  First /BP, but for an {OBJNAME} that previously appeared as a 
  forward reference (PDF_OBJECT_FROM_FWDREF branch): implementation 
  is similar to PDF_OBJECT_DISCARD_NEW. The difference is that 
  pdfmark_EP() does not discard the temporary stream, but merges 
  it's contents into the main object. pdfmark_EP() knows which of the 
  2 actions ("discard" or "merge") is to be performed based on the 
  sign of the (temporary) value stored into pres->object->id.
Comment 10 SaGS 2005-01-19 13:19:41 UTC
Created attachment 1162 [details]
"Bug687560 - Distiller 4,05.pdf"

PDF produced by Acrobat Distiller 4,05c from the sample file attached 
to this bug report, comment #1. Note it contains a single Form XObject, 
the 1st one generated (one of the brown ones, not blue).

Note:
  The file attached to comment #3 "An usefull abuse" would have shown 
  better how various distillers deal with duplicate {OBJNAMES}, but some 
  of these have problem with pdfmarks called from /EndPage...
Comment 11 SaGS 2005-01-19 13:20:29 UTC
Created attachment 1163 [details]
"Bug687560 - PDF_OBJECT_DISCARD_NEW.pdf"

PDF produced by patched GhostScript from the same file.

Note:
  A bit larger than the one produced by Distiller because of an 
  unrelated problem (Distiller recognizes a stroked charpath and 
  uses PDF text operators to render it; GhostScript does not, and 
  generates many PDF movetos/ linetos/ curvetos instead).
Comment 12 Igor Melichev 2005-01-24 23:57:56 UTC
Thank you for submitting the improved patch.
It looks much better, but still needs improvements.

1. Negative numbers were not used for object ids except -1. Since you never 
search by a negative id, please assign -1 with a comment.

2. PDF_OBJECT_ASSIGN_NEW appears in a comment without any definition or 
explanation.

3. The semantics of PDF_OBJECT_DISCARD_NEW is not natural. In most cases an 
appearence of a new object with same name to be interpreted as changing the 
referent, i.e. similarly to an assignment operator. I believe that we discussed 
it many times and agrred. I'm not clear why you miss this feature now. Either 
restore it or issue an error.

4. In the Comment #9 you refer to ""User's perspective" in the     
PDF_NAMECOLLISION_KEEPOLD section of comment #6;". I'm not sure that we 
understand this subject in same way. IMO merging _equal_ objects is a kind of 
optimization, which might be either delayed or implemented with stream 
comparizon after a stream is written, as we do for images. Lets delay it for 
now to simplify the patch. I mean please don't include any effort for it now. 
If it is important for you, we'll open another bug and work on it separately 
later.
Comment 13 SaGS 2005-01-26 12:41:09 UTC
> 1. Negative numbers were not used for object ids except -1. Since you 
> never search by a negative id, please assign -1 with a comment.

It DOES search by the absolute value of cos_object_t.id, see the 
"cos_stream_t *pcsmain = ..." line in pdf_exit_XObject() ("LABS(..)").
When cos_object_t.id < 0, the temporary stream is discarded. However, 
it is still necessary to locate the "main" cos_object_t to reset its 
".is_open" flag to "false". So, cannot use -1L.

As to commenting this, I find there are already enough comments. The 
2 comments following the "case PDF_OBJECT_FROM_FWDREF:" and 
"case PDF_OBJECT_DISCARD_NEW:" lines in start_XObject() explain the 
usage of this field. Also, the comment starting with "Invariant for 
all non-error exits from start_XObject()" at the end of this function 
explains (again) what's the use of the sign and of the absolute value.

-------

> 2. PDF_OBJECT_ASSIGN_NEW appears in a comment without any definition 
> or explanation.

You are right, the line  "* - continue as for PDF_OBJECT_ASSIGN_NEW" 
in start_XObject() is a leftover from the "complete" patch (comment #8) 
and has to be removed.

-------

> 3. The semantics of PDF_OBJECT_DISCARD_NEW is not natural. In most 
> cases an appearence of a new object with same name to be interpreted 
> as changing the referent, i.e. similarly to an assignment operator. 
> I believe that we discussed it many times and agrred. I'm not clear 
> why you miss this feature now. Either restore it or issue an error.

With the risk of repeating myself, I'll state again: 
PDF_OBJECT_ASSIGN_NEW, PDF_OBJECT_DISCARD_NEW and PDF_OBJECT_CREATE_NEXT 
are 3 DIFFERENT WAYS of solving THE SAME problem. Currently, there's 
no absolute need to keep all of them, only one is necessary. Which one? 
Comment #6 points out with which PS->PDF converter implements each of 
them.

More than once, you made contradictory requests concerning this point.

On one hand, you want complete Acrobat Distiller compatibility; see: 
"1. Please prove that this behavior is same as Adobe Distiller." in
http://www.ghostscript.com/pipermail/gs-code-review/2004-November/004706.html
and "Also the patch must be tested for full compatibility with Adobe." in
http://www.ghostscript.com/pipermail/gs-code-review/2004-December/004720.html.

On the other hand, you want a variable-like behaviour, as you describe 
at point 3 in 
http://www.ghostscript.com/pipermail/gs-code-review/2004-November/004701.html.

These two demands are not compatible. With the sample file attached to 
comment #1, Distiller produces a PDF with brown drawings on both pages, 
and the variable-like method one with blue drawings on both pages. The 
differences are even better seen in the 3 PDF attached to 
http://www.ghostscript.com/pipermail/gs-code-review/2004-November/004705.html
The source PS file is the one attached to comment #3.

PDF_OBJECT_ASSIGN_NEW implements the variable-like behaviour, and 
dissapeared in favor of PDF_OBJECT_DISCARD_NEW because you insisted 
on complete Distiller compatibility.

** PLEASE make up your mind: Distiller-compat OR variable-like. **

I can "cut" from the "complete" patch that part that implements 
PDF_OBJECT_ASSIGN_NEW and remove PDF_OBJECT_DISCARD_NEW and 
PDF_OBJECT_CREATE_NEXT. I personally prefer PDF_OBJECT_ASSIGN_NEW, 
for many reasons. Yes, I agree it's much more intuitive from a 
programmer's point of view. And no, I don't think Adobe thought a lot 
on this problem before choosing the implementation in Distiller (this 
is only a personal opinion, of course).

-------

> 4. In the Comment #9 you refer to ""User's perspective" in the     
> PDF_NAMECOLLISION_KEEPOLD section of comment #6;". I'm not sure that 
> we understand this subject in same way. IMO merging _equal_ objects 
> is a kind of optimization, which might be either delayed or 
> implemented with stream comparizon after a stream is written, as we 
> do for images. Lets delay it for now to simplify the patch. I mean 
> please don't include any effort for it now. 
> If it is important for you, we'll open another bug and work on it 
> separately later.

There doesn't exist a strict subset of this patch that deals with the 
name collision in a Distiller-compatible way and does not provide the 
"optimization", so there's nothing to remove. Reason: at any point in
time, we have A SINGLE Form XObject (so nothing to compare/ collapse/ 
etc). The optimization is merely a (desirable) side-affect of the way 
the code works. This is also true for PDF_OBJECT_ASSIGN_NEW.

Maybe you are misleaded by the fact that sometimes pdf_exit_XObject() 
is supposed to "merge" something. That "merge" does not refer to 
collapsing 2 identical Form XObjects. When entering pdf_exit_XObject() 
with a negative value in 
pdev->accumulating_substream_resource->pres->object->id, a single Form 
XObject exists, but its component pieces are physically spread in 2 
cos_object_t structures. The "merge" refers to gathering all these 
pieces into a single cos_object_t.

These 2 cos_object_ts are:
(1) the "main" one, linked from pdev->local_named_object, containing:
    - the correct object number;
    - the pdf_recource_t structure (including the "Rn" resource name);
    - eventual (key, value) pairs added with dict-/PUT pdfmarks issued 
      since the /BP that started this XObject.
(2) a temporary one, created by pdf_alloc_aside() called during 
    pdf_enter_substream() and accesible only via 
    pdev->accumulating_substream_resource, containing:
    - the stream-proper part (PDF ops generated from PS marking ops);
    - (key, value) pairs that deal with the stream part (filters...);
    - (key, value) pairs added by pdfmark_BP() itself (/BBox etc).

-------

About the "as we do for images" part:
- If that "collapsing" is done for UNNAMED images, then it has nothing 
  to do with Form XObjects (which are NAMED);
- If it involves named images too, then that's a bug.

The reasons for both these statements are explaind here 
http://www.ghostscript.com/pipermail/gs-code-review/2004-December/004719.html
(
It's that "lengthy article" you didn't have time for; start reading at 
"Comparing by contents in order to remove duplicates".

Also consider this example:

        (image data) ... image % be it named or not
        showpage % end of page 1

        [/_objdef {ImgNAMED} /NI pdfmark
        (SAME image data) ... image
        showpage % end of page 2

        % later, add an alternate image for {ImgNAMED}
        [{ImgNAMED} << /Alternates ... >> /PUT pdfmark

If you merge the 2 Image XObjects at the end of the 2nd invocation of the 
"image" operator, the alternate image will incorrectly affect the image on 
page 1 too.
)
Comment 14 Igor Melichev 2005-01-27 02:51:07 UTC
Well, we must be compatible with Distiller. Please cut out other branches. 
Discarding an object please generate a warning.
Comment 15 SaGS 2005-01-30 05:47:54 UTC
Created attachment 1185 [details]
Modified "Bug687560 - PDF_OBJECT_DISCARD_NEW.diff"

Here's the patch modified as requested.

Changes:

- Now printing a message when discarding an XObject definition. It's
  printed to stdout, not stderr, because it does not necessarily
  denote an error condition. Especially after the patch for
  Bug 687561 "Smaller PDFs when using execform", discarding a 2nd
  copy may actually be a good thing.

- The request "Please cut out other branches" is a mystery for me.

  - PDF_OBJECT_CREATE_FIRST is the branch that creates the 1st (and
    the only, in most cases) XObject with a given name. Cannot remove.

  - PDF_OBJECT_CREATE_UNNAMED is the same as PDF_OBJECT_CREATE_FIRST
    except it deals with an unnamed object (start_XObject() is called
    from pdfmark_PS() too). It does not add the object to
    pdev->local_named_objects and does not dereference a NULL pointer
    as the current GS version does. Cannot remove it.

  - PDF_OBJECT_FROM_FWDREF handles the case when we already have a
    cos_object_t of type cos_type_generic created by a previous
    forward reference. Cannot remove either; with current GS version,
    this case produces an invalid PDF the same way as a duplicate
    objname does.

  However, I removed the error "case:"es and spread the "return"s
  around. I find having cases for error reporting clearer, but it
  seems a few more lines in the diff file are fatal.

- Found and fixed a problem with the interaction between pdfmark_PS()
  ("P" the "S", not "S" then "P") and start_XObject(). The former
  assumed the cos_stream_t returned by the latter will be always
  around. But, that may be a temporary one, acting as a "waste basket",
  and is to be used only for writing the contents into. Now
  pdf_exit_XObject() returns a pointer to the correct cos_stream_t
  to be used for creating references to.
Comment 16 SaGS 2005-09-01 11:57:01 UTC
The paches attached to this bug report are affected by the changes 
for ps2wite made since these patches were posted (/.Global and 
pdev->accumulating_a_global_object). I did update my working copy; 
if interested in updated patches, place a comment here or contact me.
Comment 17 leonardo 2005-09-01 12:15:03 UTC
Yes, we're interested in an updated patch.

> The request "Please cut out other branches" is a mystery for me.

It means that branches, which are disabled with #if, should be removed.
Comment 18 SaGS 2005-09-03 12:07:56 UTC
> > The request "Please cut out other branches" is a mystery for me.
> It means that branches, which are disabled with #if, should be removed.

If you really mean "#if" (= conditional compilation): these NEVER 
EXISTED in this patch.

If you mean "dead code"/ "branches not taken"/ "unused 'case:'es": 
at the time that request was made, these were ALREADY REMOVED (except a 
"default: /* should never happen... */ return_error(gs_error_unknownerror);", 
because there's a good habbit to have a "defalut:" in every "switch", 
plus some compilers complain when there isn't one). Before, such branches 
existed because there was no user-accesible method to choose between 
the 3 implemented alternatives.

If a particular piece of the code seems to you to be unused, please 
point it out and I'll verify.
Comment 19 SaGS 2005-09-03 12:10:50 UTC
Created attachment 1672 [details]
"Bug687560 - PDF_OBJECT_DISCARD_NEW.diff" updated for ps2write

Patch from comment #15, modified to handle "/.Global" and 
"pdev->accumulating_substream-resources" that were recently added. 
Also fixed a comment that was still refering to an old identifier 
("PDF_NAMECOLLISION_*" became "PDF_OBJECT_*").

Note:
  The sample file in comment #1 will display 2 empty pages with ps2write.
  See bug 688293 "opdfread: Wrong graphics state when painting a Form XObject" 

  for details and a patch.
Comment 20 leonardo 2005-09-06 10:22:18 UTC
Patch to HEAD :
http://ghostscript.com/pipermail/gs-cvs/2005-September/005712.html
Comment 21 SaGS 2005-10-13 10:23:31 UTC
I'm reopening this bug for the reasons explained, one binary month ago, in
http://ghostscript.com/pipermail/gs-devel/2005-September/003241.html .
Comment 22 leonardo 2005-10-17 12:27:05 UTC
One more patch to HEAD :
http://ghostscript.com/pipermail/gs-cvs/2005-October/005769.html
Comment 23 leonardo 2005-10-18 02:21:35 UTC
Dear SaGS,

You wrote :
> I was explicitly requested to supply a patch that implements 
> a Distiller-compatible behavior. 

Only supported Artifex customers are eligible to request something from 
Artifex. In such neccessity they should contact Artifex Support in the 
appropriate form. Free users are allowed to open bugs, which Artifex considers 
as an additional information for improving the product quality. A decision 
about performing corresponding improvements may be taken if the improvement is 
important for Artifex customers and if Artifex has enough resourced for doing 
it. In this bug those constraints are not satisfied.
Comment 24 SaGS 2005-10-19 02:01:30 UTC
To clarify things:

> ... I WAS explicitly requested ...
(please note the passive case)

That is the reviewer of the patch requested that for me to comply, it's not me 
who requested this to the reviewer or Artifex. I only consider fair that since 
the reviewer requested this, any patch accepted for this bug fulfills that 
request.

Additional note: from your commit log, I suspect we don't talk about the same 
problem...
Comment 25 leonardo 2005-10-19 09:43:25 UTC
Dear SaGS,

You wrote :

> To clarify things:
> 
> > ... I WAS explicitly requested ...
> (please note the passive case)

Oops, I was not attentive reading it. I'm sorry. Please ignore last comment.

Comment 26 Raph Levien 2005-10-19 10:58:33 UTC
Hi SaGS,

   I'm reviewing some of the bugs in the tracker, and am trying to figure out what to do with this one. 
There's a lot of complex discussion, and it's not immediately clear to me what the benefit is in 
continuing work on this.

   I believe the patch in #22 should fix bug (C) in your #21 link. Do you agree?

   What I'd like to see is a clear, succinct description of the remaining problem, and why it's important 
(is it something we're likely to see only from your application, or more generally?). Given that, I'm 
amenable to accepting a patch, as long as it isn't too long or scary, that you feel resolves the bug.

Thanks --Raph
Comment 27 SaGS 2005-10-23 15:04:36 UTC
Dear Raph,

>    I believe the patch in #22 should fix bug (C) in your #21 link.
>    Do you agree?

I agree the 1st hunk (affecting pdfmark_PS()) solves "(G) 2 inherited 
bugs". The 2nd hunk (affecting pdfmark_EP()) looks pretty useless to me 
(because of gdevpdfr.c::pdf_objname_is_valid()).

>    What I'd like to see is a clear, succinct description of the 
>    remaining problem, and why it's important (is it something we're 
>    likely to see only from your application, or more generally?).

What remains (from an exclusively technical point of view):
    Mainly, the bug itself. The commited patch does not solve all of 
    it; only a part, and even that for the wrong reasons.

    I would like to point out the bug is about outputting invalid 
    PDFs (missing objects and xref entries that point in the middle 
    of nowhere), not about file size reduction. The latter comes as a 
    free bonus with both the "Distiller-compatible" and "variable-like" 
    implementations, but is blocked by the commited patch for which 
    immediately writting each and every xobject is essential.

Who benefits from my patch'es side-effect:
  - Anyone "printing" repetitive elements, either by including EPS 
    files multiple times or by using PostScript forms (watermarks, 
    logos, variable data printing, graphical bullets, ...).
  - Anyone redistilling PDFs that already contain Form XObjects
    (GS currently flattens them, bloating the output file).

    Example: real-world file shrinking 20-40x.
    Consider http://bugs.ghostscript.com/show_bug.cgi?id=688001#c1
    (note: I have absolutely no connection with Objectif Lune Inc., 
    or with the user that posted the file on comp.lang.postscript).
    It contains 4 EPSes, which it converts to PostScript forms and 
    then paints them over and over, one of these with 2 different CTMs.

  - As-is, current HEAD with pdfwrite converts it to approx 16.5M PDF;

  - After (CAREFULLY) bracketing each EPS'es contents with BP-EP 
    pdfmarks:
    - With the commited patch: size is 18.7M (xobjects add some 
      overhead, duplicates not detected). With -dForOPDFRead I got 
      a gigantic 109M PDF! (I think this comes from -dForOPDFRead 
      including raster images into content streams, combined with not 
      detecting duplicate xobjects.)
    - With my patch: size goes down to 454k, or 4.5M if -dForOPDFRead; 
      the latter is caused by repeatedly including the same Type3 
      font, there are still only 4 Form XObjects.

  - The same result can be obtained without touching the file, but 
    providing a suitable implementation of GS's execform (bug 687561 
    "Smaller PDFs when using execform", although I changed that code 
    since then). This benefits everyone, automagically.
    Note: The "3. Doing (3) [(2)?], the CTM to be factored out." in 
    http://bugs.ghostscript.com/show_bug.cgi?id=687561#c5 is 
    particularily infeasible.

>    Given that, I'm amenable to accepting a patch, as long as it 
>    isn't too long or scary, that you feel resolves the bug.

Please see below for an over-simplified patch. Hope it's not scary.

What I wanted was to fix pdfmarks that relate to Form XObjects 
myself, since I find these are way underused and it seems nobody else 
has the time or desire to do this.

Sincerily yours,
sags
Comment 28 SaGS 2005-10-23 15:05:52 UTC
Created attachment 1725 [details]
Over-simplified patch.

The simplification comes from some reduced functionality. However, 
the bug is fixed completely, and result is compatible with Distiller.

The patch affects only gdevpdfm.c, and the diff is against -r1.48, 
that is without the 2 commited patches attached to this bug report.

Compared to the patch in comment #19:

- removed support for forward references, raising a rangecheck instead;
- accepting the inconsistency described at (F) in the message linked 
  from comment #21 above, since it's not so important and I got a 
  considerably reduction of the code size;
- removed the rather extensive comments.

How it works:

- When starting an XObject, a check is made if the object name is new;

- If new, XObject creation is equivalent to the version before the 
  patch; same if no name (possible for PostScript XObjects);

- If the name is a duplicate, then the stream is discarded.
  Think of using pdf_substitute_resource(), except
  - the decision on whether to keep the object or not is done by 
    start_XObject() based on object name (and remembered as 
    pres->object->id == -1), not by pdf_find_same_resource() searching 
    pdev->resources[] (this bug is about same NAME, not same CONTENTS).
  - pdfmark_PS() locates the matching object again by name, does not 
    rely on the assumed pdf_substitute_resource() to return it
    (note: pdfmark_EP() does not need to find this object).
Comment 29 leonardo 2005-10-24 00:42:02 UTC
Here is a quote from "pdfmark Reference Manual November 30,2004" page 14 :

[beg quote]
That is,the {objname }can be in the argument list of a pdfmark
construct before it is defined.
[end quote]

The suggested patch 1725 doesn't comply.
Comment 30 leonardo 2005-10-24 00:47:27 UTC
Due to the quote from Comment #28 

    "removed support for forward references, raising a rangecheck instead;"

and Adobe quote from comment #29, 
I conclude that the patch 1725 causes a regression and must not be committed.

Since this bug requires too many efforts form Artifex engineering resources, 
and since it has no demand from Artifex customers, I suggest to pay half bonus 
for investigations done and close the bug with 'WONTFIX".
Comment 31 SaGS 2005-10-24 05:43:52 UTC
Dear Leonardo (== Igor?),

(1) I am perfectly aware that forward references are allowed and always 
    said that, for more than a year already. This bug has a long history, 
    much of it on gs-code-review.
(2) The removal of support for forward references is NOT FROM Ghostscript,
    but FROM THE PATCH.
(3) The patch in comment #19 "'Bug687560 - PDF_OBJECT_DISCARD_NEW.diff' 
    updated for ps2write", which I consider the one to be really 
    Distiller-compatible, DOES support forward references. If you want 
    forward refs, apply that patch.
(4) Previously, Ghostscript produced invalid PDFs if forward refs to XObjects
    were present, so strictly speaking I don't see where the regression is. 
    This bug REMAINS after the patch you commited.
(5) More than once, Igor advocated AGAINST accepting forward refs, and asked 
    to reject them (which I didn't really agree with...)
(6) Looking at your patch, I could only conclude that you don't consider 
    this functionality important at all, at least not so important as a 
    smaller diff. So, I modified my patch by removing the code for it. 
    You only have to make up your mind about forward refs.
Comment 32 Raph Levien 2005-10-26 10:07:51 UTC
Hi SaGS,

Thanks for your patient hard work trying to get this problem resolved. I think there have been some 
miscommunications, which hopefully we can resolve and then get this closed.

1. The patch in #22 is not the only one that's been committed in this area. Are you sure that HEAD is 
still generating incorrect PDF? Does the file in comment #1 trigger this behavior? If not, can you post a 
new test case that does?

2. I'm still confused by the forward reference issue. I think we may have mistakenly believed it was OK 
to reject forward references in the past, but now I think everyone is in agreement that we must handle 
them.

From your point (2) in comment #31, I _think_ you're saying that HEAD + your patch in #28 will still 
process inputs containing forward references as well as HEAD without the patch. Is this what you are 
asserting? Can you write a short test file that exercises this behavior? Obviously, I would be unhappy if 
such a file were to run correctly through HEAD but cause a rangecheck after applying #28.

Thanks again.
Comment 33 SaGS 2005-10-26 13:43:38 UTC
Created attachment 1731 [details]
Sample for other 2 incarnations of the bug.

Dear Raph,

> ...
> 1. The patch in #22 is not the only one that's been committed in 
>    this area. Are you sure that HEAD is still generating incorrect 
>    PDF? Does the file in comment #1 trigger this behavior? If not, 
>    can you post a new test case that does?

The sample in comment #1 does not contain forward references, so it
does not trigger this incarnation of the bug.

I attach a ZIP file containing a simple PostScript file and the PDF 
produced from it by current HEAD. It illustrates the 2 incarnations 
of the bug I mentioned at point (A) in my recent gs-devel post
(http://ghostscript.com/pipermail/gs-devel/2005-September/003241.html)
which remain present in current HEAD, including the one concerning 
forward references.

In the output PDF, take a look at the 2 custom keys added to the 
{Catalog}. The objects they point to don't exist in the PDF (obj #4 for
/PointerToLostFwdReferencedXObj and #9 for /PointerToLostArray).
The xref entries for them are not marked as deleted and point 
somewhere at random (the one for #4 points, by hazard, to object #6 
and the one for #9 points somewhere inside #6).

> 2. ...
>    From your point (2) in comment #31, I _think_ you're saying that 
>    HEAD + your patch in #28 will still process inputs containing 
>    forward references as well as HEAD without the patch.

Yes (but, to be exact, I'm refering to HEAD without commited patch in 
comment #20 and comment #22, but with my patch in comment #28, not 
TODAY's [2005/10/26] HEAD + comment #28).

>    ...
>    Can you write a short test file that exercises this behavior?

For testing, please use the *.ps file in the archive, even better cut 
in 2 for the 2 parts. With my patch, part 1 (forward reference to xobj) 
terminates with a rangecheck. Part 2 (redefining a preexistent 
object that is not an xobject) terminates with a typecheck, because 
pdfmark refuses to change the PDF type of an object (was a PDF array, 
trying to redefine as a PDF stream).
Comment 34 leonardo 2005-11-02 09:45:07 UTC
The supplied PS file 1731 is incorrect.
An Adobe log follows.


Acrobat Distiller 6.0
Started: 2 íîÿáðÿ 2005 ã. at 20:42

Adobe PostScript software version: 3015.102

Start Time: 2 íîÿáðÿ 2005 ã. at 20:42
Source: Bug687560-#a.ps
Destination: F:\AFPL\MLC2\PROB574-687560\Bug687560-#a.pdf
Adobe PDF Settings: G:\Documents and Settings\All Users\Documents\Adobe PDF 6.0
\Settings\Standard.joboptions
%%[ Error: typecheck; OffendingCommand: pdfmark; ErrorInfo: SP LostArray ]%%

Stack:
/EP
-dict-


%%[ Flushing: rest of job (to end-of-file) will be ignored ]%%
%%[ Warning: PostScript error. No PDF file produced. ] %%
Distill Time: 0 seconds (00:00:00)
**** End of Job ****

Comment 35 SaGS 2005-11-02 20:10:24 UTC
(1) So, from Bug687560-#a.ps one gets:

    - An invalid PDF, with no indication that something went wrong,
      from current HEAD;
    - A "typecheck" from Distiller 6;
    - The same "typecheck" from GS with either my patch from 
      comment #28 "Over-simplified patch." (version without fwd refs 
      support) or the one from comment #19 ""Bug687560 - 
      PDF_OBJECT_DISCARD_NEW.diff" updated for ps2write" (version 
      with fwd refs). For the former, one obviously needs to remove 
      part 1 from the file, because it is about fwd refs that the 
      patch does not support (would raise a "rangecheck").

    I would conclude the test is meaningfull, not faulty.

    Also please note that the part where Distiller complains has 
    nothing to do with fwd refs. Both my patches do some "sanity 
    checks" (inspired from verifications done by pdfmark_SP(),
    plus some common sense), and it's such a check that leads to 
    the "typecheck", not something related to fwd refs.

(2) As is, the file responds very well (with a "yes") to the 
    question whether parts of the bug remained in current HEAD 
    or not, including the one about forward references.

(3) To further examine each of the 2 incarnations of the bug 
    that remained in HEAD, separate the 2 parts of the file as 
    I already mentioned in the last paragraph of comment #33. 
    The 2 parts are independent, but I put them in a single file 
    because I hate having too many files laying around.
Comment 36 leonardo 2005-11-02 21:39:34 UTC
Please provide a correct Postscript.
We do not care about PDFs generated from incorrect Postscript.
Comment 37 SaGS 2005-11-03 00:44:13 UTC
Created attachment 1746 [details]
Bug687560-#a-part1.ps

Part of the sample that concerns forward references to Form XObjects.

Note: The other part does not contain anything contrary to PLRM or the pdfmark
Reference Manual, it's just Distiller messing things, but that's not so
important now.
Comment 38 SaGS 2005-11-06 14:05:42 UTC
Because forward references came again into discussion: it seems after 
the 3rd rewrite of the patch I didn't test these enough. The patch 
in comment #19 "'Bug687560 - PDF_OBJECT_DISCARD_NEW.diff' updated for 
ps2write" does not set correctly some flags when fwd refs are found, 
the most nasty consequence being a double-free during pdf_close(). 
I appologize for this mistake, now it's fixed (3 lines of code) and 
I tested the whole thing again. If you arrive at looking into this, 
I'll post the changes.
Comment 39 leonardo 2005-11-07 01:05:52 UTC
Dear SaGS,

You may submit the improved patch for archiving purpose. In same time it would 
be better if you concentrate on creating correct Postscript samples, which 
demonstrate problems you mention. Thank you.
Comment 40 SaGS 2005-11-11 14:57:29 UTC
Created attachment 1761 [details]
Test pack.

Here is a pack of PS files for testing, also for proving the remarks 
I made in 
http://ghostscript.com/pipermail/gs-devel/2005-September/003241.html.

I also included logfiles and PDFs generated by:

- Acrobat Distiller 5.0/ Windows (file names include "Distiller 5,00"), 
- Jaws PDF Creator 2.11 ("JawsPDF 2,11") - just to see "another 
  opinion" on the subject, 
- Ghostscript recent HEAD ("Ghostscript HEAD"), and 
- Ghostscript with a corrected version of my patch from comment #19
  ("PDF_OBJECT_DISCARD_NEW") instead of the patches in comment #20 and 
  comment #22.; I'll attach this version in a following comment.

I consider the Distiller version to take as a reference is 5, since 
Ghostscript presents itself as compatible with this one; pdfwrite 
produces PDFs version <= 1.4, also see 

    "currentdistillerparams /CoreDistVersion get =="

(with pdfwrite as current device, of course).

I would take into consideration the behaviour of other Distiller 
versions, as long as it represents a progression, not a regression.


A few words about each test follow. First the ones for (A)..(F) 
in my gs_devel post. Then a file for testing all combinations of 
forward ref + definition + redefiniton + "normal use" for both 
Form XObjects and PS XObjects, to be sure the patch fixes what it's 
expected to fix and does not break anything.

The problem about some flags mentioned in comment #38 is simply 
a coding error in my patch, now fixed, so there's no test for it.


Bug687560-#a-part1.ps ---

  - for "(A) Output PDF still invalid" from my gs-devel post, 
    firt dash (behaviour regarding a forward reference);
  - Distiller, Jaws PDF Creator and my patch: OK;
  - GS HEAD: invalid PDF: object #4 referenced but missing from 
    the PDF, its xref entry points at random (here, to obj #6).

Bug687560-#a-part2.ps ---

  - same "(A) Output PDF still invalid", 2nd dash (behaviour 
    in the presence of a pre-existing object w/ same name);
  - Distiller and Jaws PDF Creator: OK;
  - GS HEAD: invalid PDF, same object #4 referenced but missing, 
    its xref entry happens to point to obj #6.
  - My patch: I considered changing the PDF type of an object 
    to be an abuse; some "sanity checks", based on tests currently 
    done by pdfmark_SP(), reject this operation with a typecheck.
  - I don't have a strong preference about what should happen in 
    this case.

Bug687560-#b.ps ---

  - for "(B) Incompatible with Distiller"; creates 2 Form XObjects 
    with same name and creates references to them, to check to which 
    instance each reference is resolved to.
  - Distiller and my patch: all refs are resolved to the 1st copy, 
    the PDF displays 2 green lines.
  - Jaws PDF Creator: different behaviour, refs are resolved to 
    the copy created last - 2 red lines; "variable-like" behaviour 
    that corresponds to PDF_OBJECT_ASSIGN_NEW from comment #8.
  - GS HEAD: different than Distiller, refs resolved to the most 
    recently created copy; corresponds to PDF_OBJECT_CREATE_NEXT 
    from comment #8.

Bug687560-#c-part1.ps & Bug687560-#c-part2.ps ---

  - for "(C) Introduces a new bug: can never /PUT into Form XObjects";
  - "part 1" tries to /PUT into the XObject during its creation 
    (between /BP and /SP), "part 2" does this after /EP;
  - Distiller and my patch: both succeed.
  - Jaws PDF Creator: limited functionality, accepts "part 1" but not 
    "part 2".
  - GS HEAD: both fail, but in diffret ways. "Part 1" gives an 
    undefined, while "part 2" apparently succeeds but the added key is 
    nowhere to be found in the PDF. (There's also a bad xref in "part 
    1".)

Bug687560-#d.ps ---

  - for "(D) ... No, it never detects equal [Form X]objects".
  - the test creates 2 identical Form XObject, with identical CTMs.
  - GS HEAD: creates 2 identical objects (#7 and #8 in "Bug687560-#d - 
    Ghostscript HEAD.pdf"), contrary to the commit message claiming 
    in this case duplicates are detected and "collapsed".

About "(E) pdf_find_same_resource(): not for named objects" ---

  - no sample file here, just look at gdevpdfj.c -r1.49, lin 492..493, 
    function pdf_end_write_image(), which avoids passing a named object 
    to pdf_substitute_resource() for a good reason. The comment inthere
    is true, but a bit misleading since the object is not written there 
    anyway (bool write == false).
  - Note: I qualified this as a POTENTIAL bug, not one with immediate 
    adverse consequences. I'm 99% sure where you want to arrive with 
    this patch; while I consider it a valuable addition for another bug 
    currently in the tracker,
      - it has nothing to do with the present bug, it does some harm 
	without truly solving the current bug;
      - for the other bug, the patch is far from being complete;
      - if used for the other bug, this "not for named resources" 
	problem, unless fixed, WILL have adverse effects.

Bug687560-#f.ps ---

  - for "(F) Inconsistency if self-referencing Form XObjects"
  - I consider this detail absolutely minor, and some people might 
    even consider the HEAD behaviour as to be expected.
  - No Distiller-produced file here, because it crashes with an
    access violation (DrWatson shows an indirect call through a 
    null pointer), so a test file triggers a bug in Distiller.
  - GS HEAD exhibits different behaviours for the 2 self-references.
  - Jaws PDF and my patch: same result for both self-refs.


Bug687560-tests.ps ---

  - More complete test file, exercices all combinations of forward 
    ref and/or "normal use" and/or redefinition; the redefiniton 
    can appear before and/or after the XObject being used.
  - Details provided in the comments. If more needed, ask.
  - Both Form and PS XObjects are tested, also a test for unnamed 
    /PS pdfmark (since all of these share code that is modified by 
    the patch).
  - For Distiller, the named PostScript pdfmark tests are skipped; 
    Distiller has trouble with some of these, and I have neither 
    the means, nor the time or desire to debug Distiller.
  - While I'm not interested in /PS pdfmarks, I included tests for 
    them to be sure nothing gets broken [more than it already was].
  - Distiller (/PS tests ommited), Jaws PDF and my patch: OK.
  - GS HEAD: invalid PDF, there are about 20 missing objects; these 
    objects' xref entries point to, or inside, other objects. There 
    are 3 different reasons for this, all having to do with the 
    commited patch; 2 of them are inherited from the GS version before 
    the patch, 1 is newly introduced (I haven't mentioned this one 
    until now).
Comment 41 SaGS 2005-11-11 14:58:40 UTC
Created attachment 1762 [details]
Replacement "Bug687560 - PDF_OBJECT_DISCARD_NEW.diff".

The one in comment #19 had problems with handling forward refs, as 
mentioned in comment #38. Also tweaked the condition for setting 
/.Global for unnamed PS XObjects: these don't need to be /.Global, 
unless the current substream is already /.Global.

Relative to current HEAD minus patches in comment #20 and it's fix 
from comment #22; a diff relative to today's HEAD would be hard to 
read. The first hunk for gdevpdfm.c is not part of the current patch, 
but it appears because other patches affected this file since -r1.48, 
and my working copy takes these into account. Not "for archiving".
Comment 42 leonardo 2005-11-14 12:29:08 UTC
Patch to HEAD :
http://ghostscript.com/pipermail/gs-cvs/2005-November/005798.html

This patch fixes important problems : generating an incorrect PDF. Some 
problems about different behavior from Adobe Distiller left unfixed due to old 
basic constraints in Ghostscript. See the log message of the patch for more 
details. We believe that those differences are not important for our customers, 
and smart users can work around them in their applications. Closing the bug now.