688829 – Merging PDF files using gs: outlines and links not updated

Bug 688829 - Merging PDF files using gs: outlines and links not updated

Summary: Merging PDF files using gs: outlines and links not updated

Status:	RESOLVED FIXED

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	PDF Writer (show other bugs)
Version:	8.54
Hardware:	Other AIX

Importance:	P4 normal
Assignee:	Ken Sharp

URL:
Keywords:	bountiable

Duplicates (1):	691075 (view as bug list)
Depends on:
Blocks:	688542
	Show dependency tree

Reported:	2006-08-04 06:13 UTC by Yves-Alain NICOLLET
Modified:	2013-08-07 05:21 UTC (History)
CC List:	4 users (show)

See Also:
Customer:
Word Size:	---

Attachments
patch which remembers a culmulative page offset and adds that to links and outlines. (1.76 KB, patch) 2008-09-13 01:44 UTC, Hin-Tak Leung	Details \| Diff
a simple pdfmark postscript file to add named destinations (486 bytes, text/plain) 2009-01-14 16:50 UTC, Hin-Tak Leung	Details
zip file of win32 svn r10625 + patch, built with VC 9 under wine (8.73 MB, text/plain) 2010-01-20 18:56 UTC, Hin-Tak Leung	Details
Updated patch for gs 9.04 (1.95 KB, patch) 2011-12-01 14:13 UTC, Brian McCarter	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Yves-Alain NICOLLET 2006-08-04 06:13:35 UTC

Hi.
I want to produce books in PDF by merging a front-cover in PDF, a body
(with working bookmarks/outlines and links) in PDF and a back cover in
PDF. I use the command:
gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=book.pdf -dBATCH front.pdf body.pdf
back.pdf
It works fine to insert my covers before and after the body. But gs does
not take into account the objects contained in front.pdf to adjust the
outlines and links, which therefore point n pages (n being the number of
pages in front.pdf) before the actual target.
I don't know if it is a bug or if there is a missing parameter in my
command.
Can someone please explain?
Thanks.

Comment 1 Yves-Alain NICOLLET 2006-08-04 06:15:18 UTC

Typing mistake in summary: read links instead of linjs

Comment 2 Raph Levien 2006-08-09 11:07:39 UTC

This appears to be a real issue when processing pdf files after pages have already been output (note 
that the same issue occurs if a pdf follows a ps). In the support call, we came up with this suggestion of 
how to implement correct behavior, involving only modifications to the pdfmark commands synthesized 
by the pdf interpreter.

First, determine a page number offset valid for the entire pdf file. Obviously, this number will be 0 if 
there are no pages output before the processing of the file. The easiest way to determine this is to 
query the PageCount device parameter on /runpdfbegin. Then, in the processing of /linkdest, add this 
offset to any page number that is processed for a PDF link. Such page numbers may be explicitly given 
in the link dest, resolved through the /Dests resource, or be computed in /namedactions.

This bug marked as bountiable but assigned to Hin-Tak Leung because he's expressed interest in 
working on it.

Comment 3 Hin-Tak Leung 2006-08-18 18:12:36 UTC

Started looking at it, and the bug is actually a bit worse 
than reported - passing a single pdf (I picked
one of the gcc manuals from gcc.gnu.org) through 
ghostscript's pdfwrite, links are preserved but outlines
are completely wrong. For concatenation, links belonging to
documents other that the first are wrong by an 
page offset. Outlines are just wrong even for single input.

Will look further.

Comment 4 Hin-Tak Leung 2008-09-13 01:44:56 UTC

Created attachment 4393 [details]
patch which remembers a culmulative page offset and adds that to links and outlines.

This patch against two-week-old svn trunk basically does what Raph described
except the /Dests resource resolution.

I am posting this mostly for requesting coding-style comments - e.g. I need to
store a culmulative page offset, is globaldict the correct place to put this?

(Looking back at my own comment 3, it doesn't seem to make much sense - never
mind...) I'll look into the DEST resolution further; This patch does the
correct thing for outlines and namedactions, I think.

Comment 5 Hin-Tak Leung 2009-01-14 16:50:11 UTC

Created attachment 4721 [details]
a simple pdfmark postscript file to add named destinations

A simple pdfmark postscript file to add named destinations. run with

gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=new.pdf old.pdf

This creates a more interesting pdf with named destinations and some outlines
for testing.

Comment 6 Hin-Tak Leung 2009-01-14 16:58:05 UTC

Apparently my patch already resolves named destinations correctly. So it updates
outlines and links to explicit pages, and here is a test for creating pdfs with
named destination for testing.

Known limitations:
1) does not cope at all with mixtures of ps and pdf files. (need a pagecount
from ps processing). can be work around by distilling individual ps files first.
2) I'll take stylistic criticisms (where to put the culmulative page count,
naming of variables, etc).

Also I am having problems with finding suitable test files with
interesting/unusual outlines and links - the majority of software have explicit
page numbers in outlines and links are updated by the patch already; I had to
create my own named destination test case.

Comment 7 Hin-Tak Leung 2009-01-14 17:02:28 UTC

3) not sure about whether to change namedactions, and unsure of what exactly are
they used for (the meaning of first/last page obviously changes for merged pdf's).

Passes back to default component owner for review?

Comment 8 Ken Sharp 2009-01-15 00:35:37 UTC

Hi Hin-Tak. First, thanks for the patch, on the whole it looks good to me. 

Although the target device is pdfwrite, in order to create a PDF file, the
pdfwrite device itself is not affected by your patch, which really makes changes
to the PDF interpreter. As a result I don't think I'm really the correct person
to review your work.

I've assigned it to Alex Cherepanov instead, as he is the owner of the PDF
interpreter.

Comment 9 Alex Cherepanov 2009-01-15 15:59:49 UTC

I don't particularly like the use of globaldict because it's
shared between multiple execution contexts - if we still support DPS.

Current patch will break when the user will try to generate multiple
independent PDF files in one GS run.

Perhaps, the patch can use /PageCount parameter of the current device.

Comment 10 Hin-Tak Leung 2009-01-15 16:37:03 UTC

I don't like globaldict either, it is just that the offset needs to be added up
across multiple input pdf files, so at the interpreter level, so somewhere that
 number needs to be stored.

/PageCount doesn't quite work - the offset needs to be a stored number that is
updated and jumped up per input pdf file, not per page; whatever one goes about
it, some new code needs be be hooked into runpdfbegin, (or the ending equivalent).

I haven't thought of the splitting/page-extraction scenario. But if I understand
the code correctly, page-extract/split does not preserve outlines and links
either, I think? And one does not and cannot reasonably expect outlines and
links to be preserved, since one is going from a large document to a smaller
one). So the patch is no worse than the current situation? (I'll have to try
this myself to be sure). i.e. in the case of page-extraction/splitting, I think
the effect of the patch is simply that outlines and links are broken in
*different* ways before and after the patch?

Comment 11 Ray Johnston 2009-01-15 23:50:28 UTC

Note that the current pdfwrite cannot be used to create multiple PDF's in a
single run. The pdfwrite only writes the PDF (from temp files) when the
pdfwrite is closed (prior to exit).

I tried this by starting with a different device, then doing 'save (pdfwrite)
selectdevice ...{ run some input files } ... restore' but this crashes since
pdfwrite doesn't properly clean up as it exits, leaving several dangling
pointers.

With respect to merging multiple PDF's into one, I think the patch is reasonable,
since DPS contexts are unlikely to be mixed with PDF interpretation, so 
globaldict isn't a very risky approach.

I can't override Alex since he's the "owner" of the PDF interpreter, but I
would request that this be reconsidered.

Comment 12 Hin-Tak Leung 2009-01-16 07:38:18 UTC

I think one can possibly do multiple output pdf's with -sOutputFile=out%d.pdf,
and if my memory serves, one can do -dFirstPage=x -dLastPage=y for page
extraction? (I haven't tried either or know the precise syntax for the latter,
but they should work).

But outline/link preservation going from larger to smaller pdf just is not
supposed to work, and users cannot and should not expect it to...

If there is an alternative for storing a number that is incremented per input
file for at the opening/closing of input files, please suggest.

Comment 13 SaGS 2009-02-04 02:24:44 UTC

> If there is an alternative for storing a number that is incremented per input
> file for at the opening/closing of input files, please suggest.

I don’t think you need to maintain (updating it for each file) such a number. 
Comment #2 suggests, as I understand it, to get the number of already printed 
pages from the page device (the /PageCount parameter) when you load the PDF.

- This is the offset to add to page numbers in the current PDF.
- It’s a constant value to be stored only while processing the ‘current’ PDF;
  so no globaldict needed. Store it alongside other status variables used by 
  the PDF interpreter.
- It's maintained automatically by the page device.
- It doesn’t matter what types of files were precessed before; any mix of
  PS + PDF will work OK.

As for -dFirstPage= and -dLastPage=, I think handling these means:

- if the original (before subtracting the stored PageCount) destination page 
  number falls ouside this range, somehow omit the link;
- otherwise, subtract FirstPage-1 (IIRC it’s 1-based), in addition to 
  subtracting the stored PageCount, from the page number.

Comment 14 Hin-Tak Leung 2010-01-20 05:32:30 UTC

*** Bug 691075 has been marked as a duplicate of this bug. ***

Comment 15 Olivier DEPALLE 2010-01-20 06:27:26 UTC

I'm using the same version of Ghostscript but my platform is Linux, and the
pdf_main.ps file is not the same, so the patch doesn't work.
Is there a way to find a patch for GS 8.54 for Linux?
Thanks.

Comment 16 Hin-Tak Leung 2010-01-20 06:50:03 UTC

> I'm using the same version of Ghostscript but my platform is Linux, and the
> pdf_main.ps file is not the same, so the patch doesn't work.
> Is there a way to find a patch for GS 8.54 for Linux? Thanks.

8.54 is a bit old (3-4 years?), and also since the patch was rejected for
stylistic grounds, I'm not inclined to backport it; it probably isn't too
difficult though. Hmm, you need to make up your mind - your Bug 691075 was filed
against Win XP for gs 8.64 .
 
I have been experimenting with running visual c++ under wine on linux lately -
would a patched windows binary help?

Comment 17 Olivier DEPALLE 2010-01-20 06:53:36 UTC

Yes, please.

Thanks.

Comment 18 Hin-Tak Leung 2010-01-20 18:56:40 UTC

Created attachment 5896 [details]
zip file of win32 svn r10625 + patch, built with VC 9 under wine

Here is zip file with a windows build of r10625 with the patch (and a few other
unrelated local changes). I have tested it briefly to make sure it is not
broken in any obvious way. Given (1) the patch is not accepted as is, (2)
building with VC9 under wine isn't supported, this is provided as is. The only
response should be whether it does the job or not; any detailed comments should
be directed to me (the sf address) and *not* to bugzilla nor the rest of the
ghostscript people.

Comment 19 Brian McCarter 2011-12-01 14:13:35 UTC

Created attachment 8168 [details]
Updated patch for gs 9.04

This patch makes the same changes proposed by Hin-Tak Leung on 2008-09-13.  It is updated for gs 9.04.  Works for me on 64-bit Gentoo Linux.

Comment 20 Ken Sharp 2013-08-07 05:21:33 UTC

I believe commit 073f460af5bb37edb1849c5d6235048598100437 will resolve this issue. It uses the PageCount key form the currentpagedevice dictionary, as suggested by Alex and SaGS and updates some pdfmark generation.

There isn't a really good example here to check with, but I'm going to close it anyway, I'll fix any problems with other pdfmark types as they are reported.