Bug 695747

Summary: git produces broken pdf from pdfTeX output
Product: Ghostscript Reporter: Knut Petersen <knut_petersen>
Component: PDF WriterAssignee: Ken Sharp <ken.sharp>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P4    
Version: master   
Hardware: PC   
OS: Linux   
Customer: Word Size: ---
Attachments: pdftex generated pdf - ok
broken gs generated pdf
gs repaired pdf
target file of first GoToR link
patched pdf_main.ps

Description Knut Petersen 2014-12-18 01:06:38 UTC
Created attachment 11385 [details]
pdftex generated pdf - ok

Steps to reproduce the problem:

Executing the commands

/usr/local/bin/gs -q -dNOPAUSE -sDEVICE=pdfwrite -r1200 -dBATCH -o test1.pdf usage.pdf; 

/usr/local/bin/gs -q -dNOPAUSE -sDEVICE=pdfwrite -r1200 -dBATCH -o test2.pdf test1.pdf

All tested programs accept usage.pdf and test2.pdf.

acroread(linux) and okular accept also test1.pdf without any warning or error.
pdffonts and evince do not accept test1.pdf without the following often repeated warning:

Syntax Warning: Illegal annotation destination

So gs (git master 07150250834717499b74c87d00a8dc52599d33ec) produces an somewhat broken pdf from a pdfTeX pdf and is able to repair it in a 2nd run.

cu,
 Knut
Comment 1 Knut Petersen 2014-12-18 01:07:46 UTC
Created attachment 11386 [details]
broken gs generated pdf
Comment 2 Knut Petersen 2014-12-18 01:08:38 UTC
Created attachment 11387 [details]
gs repaired pdf
Comment 3 Knut Petersen 2014-12-18 02:48:57 UTC
gs 9.07: ok
gs 9.15: broken
gs master: broken

I could bisect if the problem is not obvious to you.

cu,
 Knut
Comment 4 Knut Petersen 2014-12-18 05:08:01 UTC
I bisected.

====================

95eea50e13987bd06d5634ec7f4dc381a76ee629 is the first bad commit
commit 95eea50e13987bd06d5634ec7f4dc381a76ee629
Author: Ken Sharp <ken.sharp@artifex.com>
Date:   Thu May 29 15:33:12 2014 +0100

PDF interpreter - add support for GoTo and GoToR in Link annotations of PDF files                                                                           

====================

pdftk uncompressed pdftex generated usage.pdf contains GoToRs like:

829 0 obj 
<<
/Border [0 0 0]
/Subtype /Link
/A 
<<
/F (web.pdf)
/D (Tiny examples)
/S /GoToR
>>
/Type /Annot
/Rect [72 282.119 128.394 289.695]
>>
endobj 

====================

The same link looks a bit different after current git master gs processed it:

300 0 obj 
<<
/Border [0 0 0]
/Subtype /Link
/A 
<<
/F (web.pdf)
/Type /Action
/S /GoToR
>>
/Type /Annot
/Rect [72 282.119 128.394 289.695]
>>
endobj 

====================

cu,
 Knut
Comment 5 Knut Petersen 2014-12-18 05:14:43 UTC
See also bug 695232
Comment 6 Ken Sharp 2014-12-18 05:20:52 UTC
(In reply to Knut Petersen from comment #4)

> ====================
> 
> 95eea50e13987bd06d5634ec7f4dc381a76ee629 is the first bad commit
> commit 95eea50e13987bd06d5634ec7f4dc381a76ee629
> Author: Ken Sharp <ken.sharp@artifex.com>
> Date:   Thu May 29 15:33:12 2014 +0100
> 
> PDF interpreter - add support for GoTo and GoToR in Link annotations of PDF
> files                                                                       
> 
> 
> ====================

OK so the reason it doesn't cause a problem in older code is simply that the older code doesn't support it.....

I'll try and find some time to look at why the /D isn't emitted, its probably something in the PDF interpreter.
Comment 7 Knut Petersen 2014-12-18 05:37:30 UTC
(In reply to Ken Sharp from comment #6)
 
> OK so the reason it doesn't cause a problem in older code is simply that the
> older code doesn't support it.....

Yes ...
 
> I'll try and find some time to look at why the /D isn't emitted, its
> probably something in the PDF interpreter.

Thanks in advance

Knut
Comment 8 Ken Sharp 2014-12-20 03:01:05 UTC
I believe that commit 2c525aa076d7094fa487621602b8e326c58eeabe should resolve this, though like the previous commits on this subject, I have *very* few examples to test with (including this one, 3 in total), so I strongly recommend testing this carefully.

It 'seems' to be OK with the sample file though.

Note that for proper operation with ps2write, this also requires commit 
b5a4f24141ab47b5ba2b7b3dff938ba30a8a908a
Comment 9 Knut Petersen 2014-12-20 10:39:54 UTC
(In reply to Ken Sharp from comment #8)
> I believe that commit 2c525aa076d7094fa487621602b8e326c58eeabe should
> resolve this, though like the previous commits on this subject, I have
> *very* few examples to test with (including this one, 3 in total), so I
> strongly recommend testing this carefully.
> 
> It 'seems' to be OK with the sample file though.
> 
> Note that for proper operation with ps2write, this also requires commit 
> b5a4f24141ab47b5ba2b7b3dff938ba30a8a908a

Both evince and acroread open the correct file, but the /D target is not found. We really need a string like /D (Tiny examples) and not /Tiny#20examples

The original usage.pdf has the following targets:
=================================================

/D (Bar and bar number checks)
/D (Bar lines)
/D (Easier editing)
/D (Extracting fragments of music)
/D (Flexible vertical spacing within systems)
/D (MacOS X)
/D (Manuals)
/D (Octave checks)
/D (Other sources of information)
/D (Saving typing with variables and functions)
/D (Skipping corrected music)
/D (Staff)
/D (Style sheets)
/D (Text encoding)
/D (Tiny examples)
/D (Tutorial)
/D (Voice)
/D (Windows)


After processing with gs master 2c525aa the following targets are present
==========================================================================
/D /Bar#20and#20bar#20number#20checks
/D /Bar#20lines
/D /Easier#20editing
/D /Extracting#20fragments#20of#20music
/D /Flexible#20vertical#20spacing#20within#20systems
/D /MacOS#20X
/D /Manuals
/D /Octave#20checks
/D /Other#20sources#20of#20information
/D /Saving#20typing#20with#20variables#20and#20functions
/D /Skipping#20corrected#20music
/D /Staff
/D /Style#20sheets
/D /Text#20encoding
/D /Tiny#20examples
/D /Tutorial
/D /Voice
/D /Windows

cu,
 Knut
Comment 10 Ken Sharp 2014-12-20 11:39:13 UTC
(In reply to Knut Petersen from comment #9)

> Both evince and acroread open the correct file, but the /D target is not
> found. We really need a string like /D (Tiny examples) and not
> /Tiny#20examples

Hmm, I'm pretty sure the destination is allowed to be a string *or* a name, and those are properly escaped names. Obviously without the target files, I cannot test this.

I'll check the spec again for more details, but its possible we cannot emit these as strings anyway. In which case I guess I'll have to delete the feature.
Comment 11 Knut Petersen 2014-12-20 12:53:53 UTC
Created attachment 11394 [details]
target file of first GoToR link
Comment 12 Knut Petersen 2014-12-20 13:03:06 UTC
(In reply to Ken Sharp from comment #10)
> (In reply to Knut Petersen from comment #9)
> 
> > Both evince and acroread open the correct file, but the /D target is not
> > found. We really need a string like /D (Tiny examples) and not
> > /Tiny#20examples
> 
> Hmm, I'm pretty sure the destination is allowed to be a string *or* a name,
> and those are properly escaped names. Obviously without the target files, I
> cannot test this.
> 
> I'll check the spec again for more details, but its possible we cannot emit
> these as strings anyway. In which case I guess I'll have to delete the
> feature.

Hmm I uploaded the target file for the first GoToR link (the first link on page 1 of usage.pdf).

After a quick look at the source tree: Wouln't it be possible to do this tranlation in devices/vector/gdevpdfm.c?

I'll have a more intense look at this tomorrow.

cu,
 Knut
Comment 13 Knut Petersen 2014-12-20 13:13:32 UTC
The pdf 1.7 spec says that D should be either a name, a byte string or an array.
Comment 14 Ken Sharp 2014-12-21 01:23:47 UTC
(In reply to Knut Petersen from comment #12)
> After a quick look at the source tree: Wouln't it be possible to do this
> tranlation in devices/vector/gdevpdfm.c?

Maybe, its certainly where I was going to start looking. First I need to understand why we're converting to names, there could be a number of reasons (memory management being one of them).


(In reply to Knut Petersen from comment #13)
> The pdf 1.7 spec says that D should be either a name, a byte string or an
> array.

And they are names, so they *should* be valid. Not the first time Acrobat doesn't follow the spec.
Comment 15 Knut Petersen 2014-12-21 04:18:11 UTC
(In reply to Ken Sharp from comment #14)
> (In reply to Knut Petersen from comment #12)
> > After a quick look at the source tree: Wouln't it be possible to do this
> > tranlation in devices/vector/gdevpdfm.c?
> 
> Maybe, its certainly where I was going to start looking. First I need to
> understand why we're converting to names, there could be a number of reasons
> (memory management being one of them).
> 

I tried

--- a/gs/devices/vector/gdevpdfm.c
+++ b/gs/devices/vector/gdevpdfm.c
@@ -740,10 +740,6 @@ pdfmark_put_ao_pairs(gx_device_pdf * pdev, cos_dict_t *pcd,
                 value.size = scan - value.data;
                 if (pdf_key_eq(&key, "/Dest") || pdf_key_eq(&key, "/D")) {
                     param_string_from_string(key, "/D");
-                    if (value.data[0] == '(') {
-                        /****** HANDLE ESCAPES ******/
-                        pdfmark_coerce_dest(&value, dest);
-                    }
                 } else if (pdf_key_eq(&key, "/File"))
                     param_string_from_string(key, "/F");
                 else if (pdf_key_eq(&key, "/Subtype"))


but realized that this fixes GoToR links but breaks the GoTo code. As a musician I'm quite busy these days, there's no time left today ...

> 
> (In reply to Knut Petersen from comment #13)
> > The pdf 1.7 spec says that D should be either a name, a byte string or an
> > array.

> And they are names, so they *should* be valid. Not the first time Acrobat
> doesn't follow the spec.

They _are_valid. But when target.pdf defines a destination (Manual) a link to \Manual is pretty useless. /D should not be changed.

cu,
 Knut
Comment 16 Ken Sharp 2014-12-21 05:16:20 UTC
(In reply to Knut Petersen from comment #15)

> > And they are names, so they *should* be valid. Not the first time Acrobat
> > doesn't follow the spec.
> 
> They _are_valid. But when target.pdf defines a destination (Manual) a link
> to \Manual is pretty useless. /D should not be changed.

I don't follow your reasoning here, /Manual and (Manual) are equivalent, if both strings and names are valid.

It doesn't (or at least shouldn't) matter whether either the source or target PDF file define the named destination as a link or a name, since both are valid the consumer should be prepared to accept either and treat them as equivalent.

Otherwise there's a burden on the producer of the source PDF file to make the type of the named destination in the source PDF the same as the type in the target file, which it may not have access to at the time.
Comment 17 Ken Sharp 2014-12-22 02:48:33 UTC
The problem is nothing to do with names or strings.

The difficulty is the way that we handle Outlines and GoTo actions in the current document. Rather than adding to the named destinations we completely resolve the destination and write a fixed location into the PDF file.

Obviously we cannot do that for GoToR destinations, as we are not creating that file, so we must preserve the named destination as a name, not a fixed destination. This means we mist process GoTo and GoToR destinations differently. Annoyingly I knew that and had fixed it once before, but had forgotten it again.

Commit dc2c3634aeae6634b130e124b1f0997f17ee3141 separates the two cases again, and handles them differently, which should resolve the problem. I've checked the Outlines and their GoTo actions, as well as verifying that the GotoR action to web.pdf on page 1 works correctly. Note that it matters not at all whether the /D for this is a name or a string, both work for me using Acrobat. All the other cases I have examples for also work correctly.

NOTE: We do not preserve the names tree from the input in the output this way, and so any *other* document which references the produced PDF file, using a GoToR action, will fail as there is no longer a corresponding named destination. That's life, I don't intend to do any work on that.
Comment 18 Knut Petersen 2014-12-22 06:59:39 UTC
(In reply to Ken Sharp from comment #17)

> NOTE: We do not preserve the names tree from the input in the output this
> way, and so any *other* document which references the produced PDF file,
> using a GoToR action, will fail as there is no longer a corresponding named
> destination. That's life, I don't intend to do any work on that.

Thanks again for the time you already spent on this topic. But you missed that this solution breaks the internal GoTo links ... try e.g. those links you find in the index of usage.pdf

cu,
 Knut
Comment 19 Ken Sharp 2014-12-22 07:04:19 UTC
(In reply to Knut Petersen from comment #18)
> (In reply to Ken Sharp from comment #17)
> 
> > NOTE: We do not preserve the names tree from the input in the output this
> > way, and so any *other* document which references the produced PDF file,
> > using a GoToR action, will fail as there is no longer a corresponding named
> > destination. That's life, I don't intend to do any work on that.
> 
> Thanks again for the time you already spent on this topic. But you missed
> that this solution breaks the internal GoTo links ... try e.g. those links
> you find in the index of usage.pdf

This is the problem with highly complex files, its impossible to test everything. It would be better to have a simpler file (or better yet, several simpler files to test each aspect).
Comment 20 Ken Sharp 2014-12-22 07:56:15 UTC
Created attachment 11395 [details]
patched pdf_main.ps

(In reply to Knut Petersen from comment #18)

> Thanks again for the time you already spent on this topic. But you missed
> that this solution breaks the internal GoTo links ... try e.g. those links
> you find in the index of usage.pdf

More accurately, it didn't fix them, as these are not Outlines, they are annotations (less than obvious, I know).

TO save me checking out the examples I have to hand fruitlessly again, try the attached pdf_main.ps and let me know how you get on with that.
Comment 21 Knut Petersen 2014-12-22 11:13:28 UTC
(In reply to Ken Sharp from comment #20)
> Created attachment 11395 [details]
> patched pdf_main.ps
> 
> (In reply to Knut Petersen from comment #18)
> 
> > Thanks again for the time you already spent on this topic. But you missed
> > that this solution breaks the internal GoTo links ... try e.g. those links
> > you find in the index of usage.pdf
> 
> More accurately, it didn't fix them, as these are not Outlines, they are
> annotations (less than obvious, I know).
> 
> TO save me checking out the examples I have to hand fruitlessly again, try
> the attached pdf_main.ps and let me know how you get on with that.

internal links:
==============
ok

www links:
==========
ok

external links to files not processed by gs:
============================================
ok with (latest linux) acroread

not ok with evince. A GoToR with /F (notation.pdf) and /D /Skipping#20corrected#20music leads evince to try to open not only notation.pdf (probably looking for "Skipping"). It also tries to open the files "corrected" and "music". As I read the pdf specs this is problem of evince.

external links to files processed by gs:
========================================
broken as expected for both acroread and evince. The named reference is missing. Although you clearly wrote in comment 17 that you "don't intend to do any work on that" I really need that. And I do have the big set of interlinking pdfs to test that functionality thoroughly. You'd be my personal hero if you would change your mind. Isn't there a new feature still missing for the 9.16 release?

cu,
 Knut
Comment 22 Ken Sharp 2014-12-22 13:23:46 UTC
(In reply to Knut Petersen from comment #21)
> internal links:
> ==============
> ok
> 
> www links:
> ==========
> ok
> 
> external links to files not processed by gs:
> ============================================
> ok with (latest linux) acroread
> 
> not ok with evince. A GoToR with /F (notation.pdf) and /D
> /Skipping#20corrected#20music leads evince to try to open not only
> notation.pdf (probably looking for "Skipping"). It also tries to open the
> files "corrected" and "music". As I read the pdf specs this is problem of
> evince.

Seems like it, yes, I believe the name syntax should be OK. Looks like evince is unescaping the name and then processing it as three portions instead of a single instance.

OK so I'll commit the change tomorrow after I check out the other examples I have.


> external links to files processed by gs:
> ========================================
> broken as expected for both acroread and evince. The named reference is
> missing. Although you clearly wrote in comment 17 that you "don't intend to
> do any work on that" I really need that. And I do have the big set of
> interlinking pdfs to test that functionality thoroughly. You'd be my
> personal hero if you would change your mind. Isn't there a new feature still
> missing for the 9.16 release?

Many, so many.....

Its a massive change to alter the way this works. At that moment we resolve the destination in the PDF interpreter and then create a pdfmark which inserts an appropriate LNK with a specific destination. Altering that would mean changing the PDF interpreter to emit a Named Destination pdfmark (not sure of the syntax for this but it is already possible, I think) and then emit a LNK to the named destination.

Of course we would first have to identify whether a given destination is a named destination or a specific destination.

You can see from the progress of this bug what a nightmare it is working with the two ends divorced like this, there would be very likely to be a large bug tail afterwards. I'd probably also have to alter the current Named destination pdfwrite code again, to write a balanced tree instead of the single flat array we write at the moment (because more named destinations would lead to larger arrays and a performance problem). And that has been a source of problems too, so you're doubling the error surface.

At the same time I'm trying to finish up some other work regarding a fairly major change to the device interface (complicated by the neccesity of keeping the existing API unchanged). That's been going on for 2+ months now and I've still not finished it, at this rate it may not make it into the next release, which would be painful as there's a customer waiting for it :-(

As I've mentioned on many occasions, pdfwrite is intended to produce PDF< not to 'process' it. The fact that it works that way is a nice bonus rather than a design goal I'm afraid. So at the moment its not something I plan to tackle, at least in the immediate future.

Unless, of course, a paying customer asks for it.....
Comment 23 Knut Petersen 2014-12-22 23:38:28 UTC
(In reply to Ken Sharp from comment #22)

> As I've mentioned on many occasions, pdfwrite is intended to produce PDF<
> not to 'process' it. The fact that it works that way is a nice bonus rather
> than a design goal I'm afraid. So at the moment its not something I plan to
> tackle, at least in the immediate future.

Ok, many thanks for the work you already did.

Extending gs seems still seems to be the most promising way for me. The other candidate pdfsizeopt.py fails completely on some input files, and produces severely broken pdfs on other input files.

With my patch to lilypond and gs master a significant 136 MB reduction of the size used for our documentation is achieved. But GoToR links are broken then.

> Unless, of course, a paying customer asks for it.....

The Lilypond development process also knows about "sponsored features". How much money would be needed for a gs that does preserve both GoTo and GoToR links?

cu,
 Knut
Comment 24 Ken Sharp 2014-12-23 01:10:26 UTC
(In reply to Knut Petersen from comment #23)

> > Unless, of course, a paying customer asks for it.....
> 
> The Lilypond development process also knows about "sponsored features". How
> much money would be needed for a gs that does preserve both GoTo and GoToR
> links?

Apologies, I didn't mean that as a hint, I don't think Lilypond is likely to want to be an Artifex customer, and we don't usually do "sponsored features" (or at least I can't think of any occasions we have done)

I'll try and remember to open an enhancement request for this, so at least I won't forget it. Maybe one day I'll find some time to rework the link processing, or more likely the named destinations, which is what it really needs since its entirely possible to have a named destination that has no links in the source document.
Comment 25 Knut Petersen 2014-12-23 01:15:23 UTC
> I'll try and remember to open an enhancement request for this, so at least I
> won't forget it. Maybe one day I'll find some time to rework the link
> processing, or more likely the named destinations, which is what it really
> needs since its entirely possible to have a named destination that has no
> links in the source document.

Season's greetings and best wishes for the New Year!

Knut