Bug 691889

Summary: pdfwrite with "/PAGELABEL pdfmark" operator does not work with multiple pages
Product: Ghostscript Reporter: pipitas
Component: PDF WriterAssignee: Ken Sharp <ken.sharp>
Status: RESOLVED INVALID    
Severity: normal CC: sags5495
Priority: P4    
Version: master   
Hardware: PC   
OS: Linux   
Customer: Word Size: ---
Attachments: simple shell script to make Ghostscript create a 50 page PDF with modified /DOCINFO pdfmark

Description pipitas 2011-01-14 17:38:15 UTC
Created attachment 7130 [details]
simple shell script to make Ghostscript create a 50 page PDF with modified /DOCINFO pdfmark

I'm re-testing the pdfmark operator functionality of Ghostscript.

This is how I try to change the page labels of a 50 page PDF document. This document itself was created by a little shell script of mine using Ghostscript's pdfwrite device with the "/DOCINFO pdfmark" to change some meta data (which works perfectly). Anyway, here is how I tried to change the page labels:

  gs \
    -o modified-pagelabels-50pages.pdf \
    -sDEVICE=pdfwrite \
    -c "[ /Page 1 /Label (i)    /PAGELABEL pdfmark" \
    -c "[ /Page 2 /Label (ii)   /PAGELABEL pdfmark" \
    -c "[ /Page 3 /Label (III)  /PAGELABEL pdfmark" \
    -c "[ /Page 4 /Label (four) /PAGELABEL pdfmark" \
    -c "[ /Page 5 /Label (v)    /PAGELABEL pdfmark" \
    -c "[ /Page 6 /Label (FIVE) /PAGELABEL pdfmark" \
    -f 50pages.pdf

The bug I'm seeing is this:

 * Only the last called pdfmark operator is applied (in our example it is "FIVE").
 * But this label is attached to page 1 instead attached to page 5.
 * All page labels for pages 2...50 are now empty (before, the labels seen were "2"..."50").

I also tried this variation of the commandline:

  gs \
    -o modified-pagelabels-50pages.pdf \
    -sDEVICE=pdfwrite \
    -c "[ /Page 1 /Label (i)     " \
    -c "  /Page 2 /Label (ii)    " \
    -c "  /Page 3 /Label (III)   " \
    -c "  /Page 4 /Label (four)  " \
    -c "  /Page 5 /Label (v)     " \
    -c "  /Page 6 /Label (FIVE) /PAGELABEL pdfmark" \
    -f 50pages.pdf

The result of his one:

 * The first page was labelled "i".
 * All page labels for pages 2...50 are also empty (in the original PDF the labels seen are "2"..."50").

Now there are 3 possibilities:

 1. Maybe my way of trying to achieve page labelling is wrong because I didn't fully comprehend the specs.
 2. Maybe Ghostscript isn't supposed to support the full pdfmark specification for page labels.
 3. Maybe this is a genuine Ghostscript bug. (I tested with v8.71, v9.00 and v9.01svn on Linux).

Please let me know which it is. In case of "1", please let me know how to make it right. In case of "2", please point me to a resource that describes the scope of Ghostscript's pdfmark support.
Comment 1 pipitas 2011-01-14 20:07:36 UTC
Nevermind case "2" from comment #1 -- meanwhile I found this useful piece of info:

    http://bugs.ghostscript.com/show_bug.cgi?id=690043#c2
Comment 2 SaGS 2011-01-15 09:12:38 UTC
The /PAGELABEL pdfmark does not have any /Page key, so one can set the 
label for the ‘current’ page only (and, as a consequence, only for one 
page at a time). Since you call it at the very beginning, it’s expected 
to set a label for the 1st page and only for it.

Multiple /PAGELABELs for the same page: the pdfmark reference says the 
last one takes effect, so the result of your 1st commandline is OK. 
Note the /Page key is ignored.

Multiple /Label keys in a single /PAGELABEL pdfmark: there’s nothing in 
the pdfmark reference to allow this. So your 2nd commandline example is 
incorrect. The result could be an explicit PostScript error, but in 
general pdfmarks are forgiving. In this case, the implementation assigns 
a page label, one of those you specified, but there’s no rule as to 
which one to choose.

About page labels being 1..50 in the ‘old’ PDF: In short, they aren’t.
My Adobe Reader (on Windows) displays page numbers as ‘<pageindex> of 
<totalpages>’ (where <pageindex> is the page’s 1-based index in the file) 
if there are no page labels in the PDF, and ‘<label> (<pageindex> of 
<totalpages>)’ if there are. Note the use of ‘()’. I get ’2 of 50’ with 
the old PDF, and ‘(2 of 50)’ with the new one. The PDF format 
does not provide any way to define labels for some pages and let the 
labels for the others ‘absolutely undefined’. Those ‘other’ labels 
must be set to something, ‘empty’ being the closest thing to ‘undefined’.

How to set page labels from PostScript? I can think of 2 methods:

(A) The 100% documented way:

    Issue a /PAGELABEL as part of each page.

(B) The less documented way:

    Use low-level pdfmarks (/OBJ & co) to construct the /PabeLabels 
    number tree and connect it to the document’s {Catalog}. The advantage 
    is that you can define all page labels at once, and you can specify 
    how to automatically generate these labels. For example you can 
    request to ‘number pages 1..2 with the string "Page " followed by 
    roman numerals starting at "x"’ instead of ‘label 1st page with the 
    string "Page x", 2nd page with the string "Page xi"’.

    gswin32c -sDEVICE=pdfwrite -sOutputFile=50pages.pdf -dNOPAUSE
    
    GS>[/_objdef {pl} /type /dict /OBJ pdfmark
    GS>[{pl} <</Nums [0 <</P (Page ) /S /r /St 10>> 2 <<>>]>> /PUT pdfmark
    GS>[{Catalog} <</PageLabels {pl}>> /PUT pdfmark
    GS>50 { showpage } repeat
    GS>quit
Comment 3 Ken Sharp 2011-01-15 09:52:29 UTC
SaGS analysis is correct, I believe. I seem to recall looking into this before. 

As to making this work; since the original file is a PDF file, you can run each page from the file individually. So you can set the PAGELABEL pdfmark for page 1, run page 1 from the original file, set the PAGELABEL for page 2, run page 2 from the original file and so on.

Because the label is (as SaGS) said applied to the current page, this should correctly set the labels for each page in the output PDF file.
(caveat: I haven't actually tried this)

I'm less sure about constructing the PageLabels tree from scratch, but if it works, then its a reasonable way to proceed.