Bug 699830 - Hyperlinks broken after conversion
Summary: Hyperlinks broken after conversion
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Text (show other bugs)
Version: 9.25
Hardware: PC Linux
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
: 699874 699896 701474 (view as bug list)
Depends on:
Blocks:
 
Reported: 2018-09-29 13:41 UTC by Tio
Modified: 2019-09-02 15:03 UTC (History)
3 users (show)

See Also:
Customer:
Word Size: ---


Attachments
ghostscript hyperlinks not working (985.55 KB, application/pdf)
2018-09-29 15:13 UTC, Tio
Details
ghostscript hyperlinks working (989.06 KB, application/pdf)
2018-09-29 15:14 UTC, Tio
Details
Ok I get it now. I attached a PDF file called input.pdf (533.63 KB, application/pdf)
2018-09-29 15:28 UTC, Tio
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tio 2018-09-29 13:41:27 UTC
After conversion links to text are either missing or misplaced. To replicate here's a demo PDF https://www.dropbox.com/s/jobbcldrsrhm2pw/The%20Origin%20of%20Most%20Problems.pdf?dl=0 - links work ok. However the converted PDF file has broken links throught the entire book. See it here https://www.dropbox.com/s/g8sy98mqkk26jkk/The%20Origin%20of%20Most%20Problems%20SAM.pdf?dl=0 - you can check page 123 as an example (the words Bitcoins and Monero). On the original PDF they work, but not on the converted one.

Reverting back to Ghostscript 9.23 I do not see the error anymore, thus from our tests the error comes with newer versions of ghostscript. PDF editors like Master PDF or PDFSam rely on ghostscript newere versions and they suffer from the same error.

Right now there is no other fix other than reverting back to version 9.23.
Comment 1 Chris Liddell (chrisl) 2018-09-29 14:13:07 UTC
Please attach a file to reproduce the problem (links tend to disappear), and give a command line to reproduce it - without that, we cannot help.
Comment 2 Ken Sharp 2018-09-29 14:14:20 UTC
Please attach example files here, URLs often go stale before anyone has a chance to investigate the problem. Smaller files will be highly appreciated, sending a multiple hundred page file simply means someone has to spend a lot of time to reduce the file to the point where it can be debugged.

You haven't supplied an example command line, you'll need to let us know how you are using Ghostscript.
Comment 3 Tio 2018-09-29 15:00:46 UTC
Ok so I don't know how to do this because I can only replicate for the entire PDF. If I extract one page with links then it may work with ghostscript. But the file is too big to attach. The link is from my dropbox. Let me know what you suggest I do and I will.

I use this command line: gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.7 -dPDFSETTINGS=/prepress -dColorImageDownsampleType=/Bicubic  -dColorImageResolution=150 -sOutputFile=output.pdf input.pdf
Comment 4 Tio 2018-09-29 15:05:58 UTC
I tried with this: gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.7 -dPDFSETTINGS=/prepress -dColorImageDownsampleType=/Bicubic  -dColorImageResolution=150 -sPageList=123 -sOutputFile=output.pdf input.pdf

If I extract one single page that has errors when converting the entire PDF, then it won't show those errors. So I can't extract one page to show you unfortunately. I think it has to do with big PDF files.
Comment 5 Ken Sharp 2018-09-29 15:13:21 UTC
(In reply to Tio from comment #4)
> I tried with this: gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite
> -dCompatibilityLevel=1.7 -dPDFSETTINGS=/prepress
> -dColorImageDownsampleType=/Bicubic  -dColorImageResolution=150
> -sPageList=123 -sOutputFile=output.pdf input.pdf
> 
> If I extract one single page that has errors when converting the entire PDF,
> then it won't show those errors. So I can't extract one page to show you
> unfortunately. I think it has to do with big PDF files.

Well, this is a 700+ page file, 160MB in size more or less, and I cna't even find any links in it (I got bored looking at page 35).

The very least you can do is point out a page number which shows a problem. If you absolutely can't find a smaller file I'll look at this one, as time permits. Don't count on a quick resolution.
Comment 6 Tio 2018-09-29 15:13:55 UTC
Created attachment 15704 [details]
ghostscript hyperlinks not working
Comment 7 Tio 2018-09-29 15:14:46 UTC
Created attachment 15705 [details]
ghostscript hyperlinks working

converted with GS 9.23
Comment 8 Tio 2018-09-29 15:14:58 UTC
Ok sorry about this. Of course it exported right since I had gs 9.23. So I am attaching 2 files: output gs923 (is the one that works converted with GS 9.23) and output gs9246 (the one that does not work and it is converted with GS 9.24-6).

Code i used: gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.7 -dPDFSETTINGS=/prepress -dColorImageDownsampleType=/Bicubic  -dColorImageResolution=150 -sPageList=123 -sOutputFile=output.pdf input.pdf
Comment 9 Ken Sharp 2018-09-29 15:15:53 UTC
(In reply to Tio from comment #8)
> Ok sorry about this. Of course it exported right since I had gs 9.23. So I
> am attaching 2 files: output gs923 (is the one that works converted with GS
> 9.23) and output gs9246 (the one that does not work and it is converted with
> GS 9.24-6).

No please don't do that, it doesn't help at all.
Comment 10 Tio 2018-09-29 15:17:50 UTC
(In reply to Ken Sharp from comment #9)
> (In reply to Tio from comment #8)
> > Ok sorry about this. Of course it exported right since I had gs 9.23. So I
> > am attaching 2 files: output gs923 (is the one that works converted with GS
> > 9.23) and output gs9246 (the one that does not work and it is converted with
> > GS 9.24-6).
> 
> No please don't do that, it doesn't help at all.

I attached what you asked for: one single page with the error instead of the entire PDF file. I attached 2 actually. 1 PDF file (containing 1 page) that does not work and it was converted with GS 9.24-6, and another similar PDF (same, 1 page) that works as it was converted with GS 9.23. Isn't this what you need?
Comment 11 Ken Sharp 2018-09-29 15:21:44 UTC
(In reply to Tio from comment #10)
> (In reply to Ken Sharp from comment #9)
> > (In reply to Tio from comment #8)
> > > Ok sorry about this. Of course it exported right since I had gs 9.23. So I
> > > am attaching 2 files: output gs923 (is the one that works converted with GS
> > > 9.23) and output gs9246 (the one that does not work and it is converted with
> > > GS 9.24-6).
> > 
> > No please don't do that, it doesn't help at all.
> 
> I attached what you asked for: one single page with the error instead of the
> entire PDF file. I attached 2 actually. 1 PDF file (containing 1 page) that
> does not work and it was converted with GS 9.24-6, and another similar PDF
> (same, 1 page) that works as it was converted with GS 9.23. Isn't this what
> you need?

You said you were attaching two files created by Ghostscript. Those don't help, looking at the output isn't helpful.

If you've supplied a file which will cause a problem when run through Ghostscript, then that is helpful. I'm currently in a meeting and will look after it finishes.

Note that there was no Ghostscript release labelled 9.24-6, that must be something produced by the package maintainer for your version of Linux, not us. This does tell me that you aren't using *our* code though. Your using code modified by the package maintainer, which may make it impossible to reproduce your problem.
Comment 12 Tio 2018-09-29 15:28:09 UTC
Created attachment 15706 [details]
Ok I get it now. I attached a PDF file called input.pdf

This file is not converted through GS. You can use it to test it. For me I can replicate if I convert this file using this code: gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.7 -dPDFSETTINGS=/prepress -dColorImageDownsampleType=/Bicubic  -dColorImageResolution=150 -sOutputFile=output.pdf input.pdf

In regards to the version, it is Manjaro's latest version. But people complain about version 9.25 as well https://tex.stackexchange.com/questions/453016/ghostscript-destroys-hyperlinks

Cheers and I am here if you need anything else from me! Thanks
Comment 13 Ken Sharp 2018-09-29 15:39:13 UTC
This was a bug fix, the behaviour previously was incorrect, now it is as intended. The Annotations in your PDF file have no /Flags entry, in the absence of that, the default value (0) is used.

This means that bit 3 of the Flags value is 0, from the PDF Reference:

3
Print
(PDF 1.2) If set, print the annotation when the page is printed. If clear, never print the annotation, regardless of whether it is displayed on the screen. This can be useful, for example, for annotations representing interactive pushbuttons, which would serve no meaningful purpose on the printed page. (See implementation note 83 in Appendix H.)

The default behaviour of the Ghostscript PDF interpreter is to be a printer. Since these annotations do not have any effect on a printer, they are dropped.

If you add to the command line -dPrinted=false then the PDF interpreter behaves as a display device instead. Because these annotations don't have NoView set, the PDF interpreter will process them, which means they will end up in the PDF file, when using pdffwrite.
Comment 14 Tio 2018-09-29 15:56:22 UTC
I don't get it though. I simply make these books into LibreOffice Draw and export as PDF, then optimize with Ghostscript and some links work some don't, and I can't figure out why some work some don't. There is nothing special from my perspective about the links that work and those that do not. Can you make it more clear please?
Comment 15 Tio 2018-09-29 15:59:27 UTC
it works with  -dPrinted=false though - thanks for that. I am still baffled as to why some links work and some don't when I add them in the same program and in the same way.
Comment 16 Ken Sharp 2018-09-29 16:13:02 UTC
(In reply to Tio from comment #15)
> it works with  -dPrinted=false though - thanks for that. I am still baffled
> as to why some links work and some don't when I add them in the same program
> and in the same way.

If a /Link Annotation has the 'Print' bit of the annotations /Flags value set, then the PDF interpreter will (by default) not process the annotation. If the PDF *interpreter* skips the annotation then the pdfwrite device doesn't ever see it.

If the Annotation (of whatever kind) does set the Print bit, then (again in default setup) the PDF interpreter will process the annotation and pass it to the pdfwrite device.

You can change the behaviour of the interpreter. If you set -dPrinted=false, then the interpreter no longer cares about the Print bit of the annotation flag. In this mode it instead checks the NoView bit, if that isn't set, thenit processes the annotation.

In this case, *if* the NoView bit was set, then the annotation would be skipped.

So you need to know something about the way the annotation has been created, currently.

The reason this is important is that the control (-dPrinted) which was supposed to control whether or not the annotation is processed was being ignored. Obviously that's not the way it was supposed to work, and has been fixed. Its unfortunate that this causes you a problem, and I may well extend the operation of this control in future, but as of now this is behaving as intended.
Comment 17 Tio 2018-09-29 18:21:54 UTC
I wish I knew how to translate that into LO Draw so that I can add the links properly. Right now I have no idea since I add all the links the same way, yet some (as you said) lack something.
Comment 18 Ken Sharp 2018-10-04 11:08:22 UTC
*** Bug 699874 has been marked as a duplicate of this bug. ***
Comment 19 Ken Sharp 2018-10-04 12:50:36 UTC
*** Bug 699874 has been marked as a duplicate of this bug. ***
Comment 20 Ken Sharp 2018-10-05 08:35:45 UTC
*** Bug 699896 has been marked as a duplicate of this bug. ***
Comment 21 Ken Sharp 2019-09-02 15:03:26 UTC
*** Bug 701474 has been marked as a duplicate of this bug. ***