Hi. I want to produce books in PDF by merging a front-cover in PDF, a body (with working bookmarks/outlines and links) in PDF and a back cover in PDF. I use the command: gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=book.pdf -dBATCH front.pdf body.pdf back.pdf It works fine to insert my covers before and after the body. But gs does not take into account the objects contained in front.pdf to adjust the outlines and links, which therefore point n pages (n being the number of pages in front.pdf) before the actual target. I don't know if it is a bug or if there is a missing parameter in my command. Can someone please explain? Thanks.
Typing mistake in summary: read links instead of linjs
This appears to be a real issue when processing pdf files after pages have already been output (note that the same issue occurs if a pdf follows a ps). In the support call, we came up with this suggestion of how to implement correct behavior, involving only modifications to the pdfmark commands synthesized by the pdf interpreter. First, determine a page number offset valid for the entire pdf file. Obviously, this number will be 0 if there are no pages output before the processing of the file. The easiest way to determine this is to query the PageCount device parameter on /runpdfbegin. Then, in the processing of /linkdest, add this offset to any page number that is processed for a PDF link. Such page numbers may be explicitly given in the link dest, resolved through the /Dests resource, or be computed in /namedactions. This bug marked as bountiable but assigned to Hin-Tak Leung because he's expressed interest in working on it.
Started looking at it, and the bug is actually a bit worse than reported - passing a single pdf (I picked one of the gcc manuals from gcc.gnu.org) through ghostscript's pdfwrite, links are preserved but outlines are completely wrong. For concatenation, links belonging to documents other that the first are wrong by an page offset. Outlines are just wrong even for single input. Will look further.
Created attachment 4393 [details] patch which remembers a culmulative page offset and adds that to links and outlines. This patch against two-week-old svn trunk basically does what Raph described except the /Dests resource resolution. I am posting this mostly for requesting coding-style comments - e.g. I need to store a culmulative page offset, is globaldict the correct place to put this? (Looking back at my own comment 3, it doesn't seem to make much sense - never mind...) I'll look into the DEST resolution further; This patch does the correct thing for outlines and namedactions, I think.
Created attachment 4721 [details] a simple pdfmark postscript file to add named destinations A simple pdfmark postscript file to add named destinations. run with gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=new.pdf old.pdf This creates a more interesting pdf with named destinations and some outlines for testing.
Apparently my patch already resolves named destinations correctly. So it updates outlines and links to explicit pages, and here is a test for creating pdfs with named destination for testing. Known limitations: 1) does not cope at all with mixtures of ps and pdf files. (need a pagecount from ps processing). can be work around by distilling individual ps files first. 2) I'll take stylistic criticisms (where to put the culmulative page count, naming of variables, etc). Also I am having problems with finding suitable test files with interesting/unusual outlines and links - the majority of software have explicit page numbers in outlines and links are updated by the patch already; I had to create my own named destination test case.
3) not sure about whether to change namedactions, and unsure of what exactly are they used for (the meaning of first/last page obviously changes for merged pdf's). Passes back to default component owner for review?
Hi Hin-Tak. First, thanks for the patch, on the whole it looks good to me. Although the target device is pdfwrite, in order to create a PDF file, the pdfwrite device itself is not affected by your patch, which really makes changes to the PDF interpreter. As a result I don't think I'm really the correct person to review your work. I've assigned it to Alex Cherepanov instead, as he is the owner of the PDF interpreter.
I don't particularly like the use of globaldict because it's shared between multiple execution contexts - if we still support DPS. Current patch will break when the user will try to generate multiple independent PDF files in one GS run. Perhaps, the patch can use /PageCount parameter of the current device.
I don't like globaldict either, it is just that the offset needs to be added up across multiple input pdf files, so at the interpreter level, so somewhere that number needs to be stored. /PageCount doesn't quite work - the offset needs to be a stored number that is updated and jumped up per input pdf file, not per page; whatever one goes about it, some new code needs be be hooked into runpdfbegin, (or the ending equivalent). I haven't thought of the splitting/page-extraction scenario. But if I understand the code correctly, page-extract/split does not preserve outlines and links either, I think? And one does not and cannot reasonably expect outlines and links to be preserved, since one is going from a large document to a smaller one). So the patch is no worse than the current situation? (I'll have to try this myself to be sure). i.e. in the case of page-extraction/splitting, I think the effect of the patch is simply that outlines and links are broken in *different* ways before and after the patch?
Note that the current pdfwrite cannot be used to create multiple PDF's in a single run. The pdfwrite only writes the PDF (from temp files) when the pdfwrite is closed (prior to exit). I tried this by starting with a different device, then doing 'save (pdfwrite) selectdevice ...{ run some input files } ... restore' but this crashes since pdfwrite doesn't properly clean up as it exits, leaving several dangling pointers. With respect to merging multiple PDF's into one, I think the patch is reasonable, since DPS contexts are unlikely to be mixed with PDF interpretation, so globaldict isn't a very risky approach. I can't override Alex since he's the "owner" of the PDF interpreter, but I would request that this be reconsidered.
I think one can possibly do multiple output pdf's with -sOutputFile=out%d.pdf, and if my memory serves, one can do -dFirstPage=x -dLastPage=y for page extraction? (I haven't tried either or know the precise syntax for the latter, but they should work). But outline/link preservation going from larger to smaller pdf just is not supposed to work, and users cannot and should not expect it to... If there is an alternative for storing a number that is incremented per input file for at the opening/closing of input files, please suggest.
> If there is an alternative for storing a number that is incremented per input > file for at the opening/closing of input files, please suggest. I don’t think you need to maintain (updating it for each file) such a number. Comment #2 suggests, as I understand it, to get the number of already printed pages from the page device (the /PageCount parameter) when you load the PDF. - This is the offset to add to page numbers in the current PDF. - It’s a constant value to be stored only while processing the ‘current’ PDF; so no globaldict needed. Store it alongside other status variables used by the PDF interpreter. - It's maintained automatically by the page device. - It doesn’t matter what types of files were precessed before; any mix of PS + PDF will work OK. As for -dFirstPage= and -dLastPage=, I think handling these means: - if the original (before subtracting the stored PageCount) destination page number falls ouside this range, somehow omit the link; - otherwise, subtract FirstPage-1 (IIRC it’s 1-based), in addition to subtracting the stored PageCount, from the page number.
*** Bug 691075 has been marked as a duplicate of this bug. ***
I'm using the same version of Ghostscript but my platform is Linux, and the pdf_main.ps file is not the same, so the patch doesn't work. Is there a way to find a patch for GS 8.54 for Linux? Thanks.
> I'm using the same version of Ghostscript but my platform is Linux, and the > pdf_main.ps file is not the same, so the patch doesn't work. > Is there a way to find a patch for GS 8.54 for Linux? Thanks. 8.54 is a bit old (3-4 years?), and also since the patch was rejected for stylistic grounds, I'm not inclined to backport it; it probably isn't too difficult though. Hmm, you need to make up your mind - your Bug 691075 was filed against Win XP for gs 8.64 . I have been experimenting with running visual c++ under wine on linux lately - would a patched windows binary help?
Yes, please. Thanks.
Created attachment 5896 [details] zip file of win32 svn r10625 + patch, built with VC 9 under wine Here is zip file with a windows build of r10625 with the patch (and a few other unrelated local changes). I have tested it briefly to make sure it is not broken in any obvious way. Given (1) the patch is not accepted as is, (2) building with VC9 under wine isn't supported, this is provided as is. The only response should be whether it does the job or not; any detailed comments should be directed to me (the sf address) and *not* to bugzilla nor the rest of the ghostscript people.
Created attachment 8168 [details] Updated patch for gs 9.04 This patch makes the same changes proposed by Hin-Tak Leung on 2008-09-13. It is updated for gs 9.04. Works for me on 64-bit Gentoo Linux.
I believe commit 073f460af5bb37edb1849c5d6235048598100437 will resolve this issue. It uses the PageCount key form the currentpagedevice dictionary, as suggested by Alex and SaGS and updates some pdfmark generation. There isn't a really good example here to check with, but I'm going to close it anyway, I'll fix any problems with other pdfmark types as they are reported.