If a PostScript file contains /#copies 2 def it is expected that the PostScript interpreter displays/prints it twice. This works perfectly when I send the file unfiltered to my HP LaserJet P3005 (PostScript printer). If I display the file with Ghostscript or print it on a non-PostScript printer with Ghostscript I get only one copy. See the Ubuntu bug report linked under "URL". OpenOffice.org uses "#copies" as its standard method to request multiple copies (most other programs send an IPP attribute to CUPS for the number of copies). So this results in the case that printing multiple copies from OpenOffice.org only works on PostScript printers and does not work on all Linux distros for all non-PostScript printers.
Created attachment 4863 [details] OOo-2copies.ps PostScript output generated by OpenOffice.org when printing. 2 copies were requested and therefore the file contains the line "/#copies 2 def".
This appears to be a driver problem. When I convert the attached file to TIFF it works as expected (each page is repeated twice). The command line I'm using: bin/gs -sDEVICE=tiff24nc -o test.tif ./OOo-2copies.ps
Then this must be checked for each output device of Ghostscript. I can at least say that this bug is valid for all X output devices and for the pdfwrite output device.
I'm changing the title and assigning this to Ken to fix the pdfwrite issue and I've opened a new bug for the X11 device. Please open additional bugs for any other devices that fail.
Making multiple copies in a PDF file doesn't make any sense (any more than it does for a screen display). Additionally, Acrobat Distiller ignores both the /#copies operator and the /NumCopies page device parameter. In my opinion our pdfwrite behaviour is correct.
We are entering a new dimension here. PDF is overtaking the role of PostScript in the printing workflow. It is getting the standard print job format. See https://www.linuxfoundation.org/en/OpenPrinting/PDF_as_Standard_Print_Job_Format I have a CUPS environment (in Ubuntu Jaunty) where the standard print job format is PDF. This especially means that all jobs get turned to PDF and processed by the pdftopdf CUPS filter. Unfortunately some applications (like OpenOffice.org) send PostScript and select the number of copies by embedding "/#copies 2 def" in the PostScript. All jobs get converted to PDF, PostScript jobs by Ghostscript with the "pdfwrite" output device. For this situation I need the PDF containing the requested number of copies (or containing some parameter telling that this PDF has to be rendered X times). To not break other things I suggest that you make this functionality optional, like only being active if "-dUSENUMCOPIES" is set on the Ghostscript command line.
The number of copies to print is job metadata. That's what /#copies (and NumCopies in the page device dictionary) represents in Postscript, and how pdfwrite should propagate it. As far as I can tell, PDF relegates that sort of metadata to the various Job Ticket format extensions, all of which are unfortunately very complicated. So ideally, pdfwrite would check the number of copies and generate a JDF for PDTF or whatever which reproduces it, and then support would be added for that feature to the PDF interpreter to set NumCopies again when rendering the PDF. A hack to duplicate the actual high-level output, as Till suggests, is worthwhile if we're not willing to do that, although I suspect it would be a similar amount of work for less maintainable code. Or if there's something simple and equivalent to NumCopies in PDF, let's do that instead! Till, you say this is an artefact of OpenOffice relying on this feature in its Postscript driver, presumedly because it doesn't talk to cups directly. What's the normal way number of copy information is propagated e.g. from the common printing dialog? Can you describe in more detail what happens when an application using cairo generates pdf directly, for example? Can gs, or the gs wrapper, somehow covert this to an IPP attribute when it hands the pdf back to cups? That might be easier than trying to implement the heavier embedded job ticket formats.
The normal way is that the number of copies is accompanied with the job as an IPP attribute and not embedded in the PostScript. Apps link with libcups and use functions from this library to poll printer lists and PPDs and also to set the options and send the job. OpenOffice.org also links against the CUPS library and loads the list of available printers and the PPDs from CUPS and it even sends the option settings as IPP attributes, only the number of copies is sent as embedded PostScript. All CUPS filters of a filter chain to process a print job are called with the same command line, where the forth argument is the number of copies and the fifth argument a string of space-separated key=value pairs for the options. In the case of OpenOffice.org the forth argument is always 1, as OpenOffice.org does not send the IPP attribute for the copies, it expects that the PostScript interpreter (independent whether on the CUPS server or in the printer) generates the copies. It is not possible for a CUPS filter (in our case pstopdf) to modify the command line of the following filters. So pstopdf cannot search for /#copies and then set the forth argument for the rest of the filters to appropriate number. The only way how a CUPS filter can react to something in the input data is to modify the output data. This is also the only way how a CUPS filter can communicate with the subsequent filters. I hope we do not need to wait for the JTAPI (Job Ticket API) library implementation to be able to fix the problem with OpenOffice.org (probably OOo will earlier send print jobs in PDF).
Ralph is correct, the number of copies is something which is job metadata, and should be sent as Adobe PJTF, CIP4 JDF or CIP3 PPF or similar. Adding support for a switch which emitted multiple copies of each page would be non-trivial, as well as making the PDF file much larger (potentially very much larger). That's not to say its impossible, merely difficult, because we would need to reserve more entries in the pages tree and the xref table, and emit each page content stream multiple times with different object numbers. I think we would be better to add the ability to embed PJTF in the PDF file (Acrobat Distiller can do this), and put the copies parameter in there. This is the nearest thing there is in a PDF file to NumCopies in PostScript. The PDF Interpreter could then optionally read the PJTF and set #copies from it. However that also begs the question of what to do with all the other content of a PJTF such as resolution. This really is a workflow problem, and I think should be tackled by adopting workflow solutions rather than hacking the behaviour of Ghostscript and pdfwrite. Presumably the OpenOffice developers will face the same problem themselves if they emit PDF files directly, either they will need to embed multiple copies of the pages in the PDF file (inefficient) a PJTF in the PDF file or set the IPP parameters to CUPS properly....
In general, I agree with Ken that this is really a workflow issue, not something that is ideally solved in Ghostscript because it provides support for an older (PS) printing workflow for something that is not supported with the PDF workflow. Regarding Ken's comment (in comment #9): > Adding support for a switch which emitted multiple copies of each page would > be non-trivial, as well as making the PDF file much larger (potentially very > much larger). The PDF would not be much larger. The contents for the copies would be shared, as would the image, font and other resources. Essentially what would be needed would be to duplicate the indirect reference in the 'Kids' array of Pages (and of course double the Count). I generated a 'tiger.pdf' using Ghostscript, inflated it with toolbin/pdfinflt.ps and the changed: 2 0 obj <</Kids [ 5 0 R ] /Count 1 /Type /Pages >> endobj to: 2 0 obj <</Kids [ 5 0 R 5 0 R ] /Count 2 /Type /Pages >> endobj and, sure enough, the tiger shows up on both pages. I've attached this file, with the caveat that it has a broken xref, but Ghostscript is able to repair it without a problem. I'm still not sure whether or not this is a good idea, but it doesn't seem that hard and sure doesn't increase the file size appreciably.
Created attachment 4879 [details] tiger2.pdf A doubled 'tiger' PDF
Ray, you beat me to reporting it. I'd just discovered at the weekend that Acrobat was happy with simply modifying the Pages tree, I'd expected it to be upset.... So its feasible, but its still ugly. I'll try and see how much work it'll be to implement when I finally get my current problem resolved. It may not be too bad if it only involves hacking the Pages tree. Not sure what to do about producing a balanced tree, might be a little more effort.
This bug report reminded me this old post from comp.text.pdf: http://groups.google.com/group/comp.text.pdf/browse_thread/thread/28db6c2ee5dd8 d46#eaa2bdf31e867403 (Message-ID: <3dec7fcb.1431823813@reading.news.pipex.net">3dec7fcb.1431823813@reading.news.pipex.net>) Didn't check if current Adobe Acrobat/ Reader versions have this anomaly or not. Anyway, I think it's safer to create multiple PDF Page dictionaries that share the values (the /Contents, /Recources, etc), instead of just refering the same PDF Page dictionary multiple times from the Pages tree.
Well, I did check a couple of recent versions of Acrobat, and they do seem to handle this situation acceptably. Also the PDF file will only (I think ?) be used inside CUPS, so as long as Ghostscript handles it correctly its probably mostly OK anyway. I haven't thoroughly checked GS yet to find out what it does with this, but I think its OK. In passing I checked with an ex-colleague who works on a different PS/PDF rip, and he had co-incidentally recently been working with a PDF file which exhibited exactly this setup. All the same, thanks for the pointer to the old Usenet postings. I'm reluctant to duplicate the page content streams because they can comprise quite large amounts of the PDF content. Also it does significantly increase the complexity of the code in pdfwrite keeping track of all the objects.
Till, I need your input with regard to CUPS and how you see this being used. I've made a quick change for the purpose of investigation which simply duplicates the entries in the pages tree enough times to satisfy the #copies or NumCopies values in force at the time the page is completed. There is a new switch for controlling this behaviour which defaults to false. The resulting PDF file works well with Ghostscript, which seems to be perfectly happy with the resulting PDF file, and in my tests so far correctly produces the expected number of pages on output of a PDF file produced with NumCopies > 1. However no version of Acrobat that I've checked (I've tried 4 versions ranging from acrobat 4.0 to 9.0) is completely happy with PDF files created like this. In general they will only display copy #1 correctly and produce varying errors, and blank pages, when trying to view later duplicate pages. In general the first copy of each page is OK, the subsequent copies do not display. Clearly a file which doesn't display well in Acrobat is not a very useful PDF file, I don't think we should produce such files except for the very specific reason of workflow problems. So we should only produce these files if they are not intended as the final output, but merely an intermediate stage. So, is this acceptable for your purposes ? That is, can you determine whether a PostScript file is intended to terminate at producing a PDF file (in which case do not preserve NumCopies) or is intended for some kind of further processing with GS ? Is there any circumstance under which these PDF files could be sent to a different PDF consumer such as Acrobat or xpdf ? NB if you take the PDF file with the duplicated pages and run it back through GS using the pdfwrite device it will happily duplicate the content streams producing a PDF file which Acrobat is then happy with. If this is not an acceptable solution, then we would need to duplicate the page content streams instead of the page tree entries, which will lead to bigger PDF files and take considerably more effort to code for. Let me know what you think please.
The switch to activate the new functionality will only be used in the pstopdf CUPS filter, so Ghostscript/pdfwrite called by other applications will not suffer any regressions. The further workflow is to pass the PDF through the pdftopdf filter, a Poppler-based page management filter (rearranging of pages for N-up, reverse order, selected pages, ...). After that it goes to the driver. For non-PostScript printers it is usually Ghostscript what renders the PDF, but it is not excluded that it can also be Poppler, as there is for example a Poppler-based pdftoopvp CUPS filter under development. For PostScript printers the PDF gets converted to back to PostScript by the pdftops filter which is based on Poppler in CUPS 1.3.x and optionally based on Ghostscript in CUPS 1.4.x (in Ubuntu Jaunty it is based on Ghostscript). So you see that both Ghostscript and Poppler are used to render PDF, and which one is actually used depends on the printer/driver in use. So if both Ghostscript and Poppler works with the output, the whole workflow should work.
Created attachment 4882 [details] page_copies.pdf I'm afraid I don't know how to drive Poppler, I've attached a PDF file here, could you try it ? It should produce 3 copies of each of two pages. Page one says 'Test', page 2 says 'Test1'. I did try the file with xpdf, which I think Poppler is based on, and it does not work, it gives four errors 'Loop in Pages tree' and one 'Page count in top level pages object is incorrect'. It displays each page once only. So it looks to me like this is not going to be a solution. I have briefly looked at what would be required for duplicating the page content streams and I think this will be several days work, possibly more. It'll take me a day or so just to work out what needs to be done. If this is required I don't think it will happen soon.
> ... what would be required for duplicating the page content streams ... My understanding of that news:// post I mentioned in comment #13 is that Reader has trouble when the same PDF Page object is referenced more the once from the Pages tree, not when multiple PDF Page objects share values like the /Contents stream. - PDF Page = the dictionary described in table 3.27 ‘Entries in a page object’, (PDF1.7 page 145). - I understand there's no problem with sharing indirect objects referenced from these PDF Page. Of course direct objects cannot be shared, but the contents stream, being a PDF Stream, can never be a direct object. So what I think you need is to output the same PDF Page dictionary multiple times, each time with a different object #, and reference all these copies from the Pages tree. There won't be much duplication, as most content (the /Contents and the resources themselves - fonts, XObjects, etc) will be shared by all copies. Maybe I'll test more this weekend to be sure. I have about 30 versions of Reader for _WIN32, starting with 3.01. I don't have any non-Windows version.
The file attached to comment #17 really does not get rendered correctly with Poppler. The command line converter pdftops gives the same result as XPDF: till@till-laptop:~/ghostscript/gpl/testfiles$ pdftops page_copies.pdf Error: Loop in Pages tree Error: Loop in Pages tree Error: Loop in Pages tree Error: Loop in Pages tree Error: Page count in top-level pages object is incorrect till@till-laptop:~/ghostscript/gpl/testfiles$ The resulting PostScript file displays each page only once.
Created attachment 4887 [details] A 3-copies Tiger that seems to work. There are 3 PDF Page objects (#4, #9, #10), and these share the /Contents stream (#5) and resources (here a single one, PDF ExtGState #8). Direct objects like the /MediaBox PDF array, the /Resources PDF dictionary, and the PDF array that's the value for /ProcSet have to be duplicated (unless replaced with indirect objects), but these are not large.
In principle the 3-copies tiger works, it displays three times in XPDF, but it gives the following console messages: till@till-laptop:~/ghostscript/gpl/testfiles$ xpdf tiger-3-copies.pdf Error: PDF file is damaged - attempting to reconstruct xref table... XtUngrabButton(drawArea,3,0) Warning: Attempt to remove nonexistent passive grab till@till-laptop:~/ghostscript/gpl/testfiles$ Also gv shows it three times but with the following console message: till@till-laptop:~/ghostscript/gpl/testfiles$ gv tiger-3-copies.pdf **** Warning: An error occurred while reading an XREF table. **** The file has been damaged. This may have been caused **** by a problem while converting or transfering the file. **** Ghostscript will attempt to recover the data. till@till-laptop:~/ghostscript/gpl/testfiles$ gs gives: till@till-laptop:~/ghostscript/gpl/testfiles$ gs tiger-3-copies.pdf GPL Ghostscript 8.64 (2009-02-03) Copyright (C) 2009 Artifex Software, Inc. All rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for details. **** Warning: An error occurred while reading an XREF table. **** The file has been damaged. This may have been caused **** by a problem while converting or transfering the file. **** Ghostscript will attempt to recover the data. Processing pages 1 through 3. Page 1 **** Warning: stream Length incorrect. >>showpage, press <return> to continue<< Page 2 **** Warning: stream Length incorrect. >>showpage, press <return> to continue<< Page 3 **** Warning: stream Length incorrect. >>showpage, press <return> to continue<< **** This file had errors that were repaired or ignored. **** The file was produced by: **** >>>> GPL Ghostscript SVN PRE-RELEASE 8.64 <<<< **** Please notify the author of the software that produced this **** file that it does not conform to Adobe's published PDF **** specification. GS>quit till@till-laptop:~/ghostscript/gpl/testfiles$ Console output of pdftops: till@till-laptop:~/ghostscript/gpl/testfiles$ pdftops tiger-3-copies.pdf Error: PDF file is damaged - attempting to reconstruct xref table... till@till-laptop:~/ghostscript/gpl/testfiles$ After that gv shows the resulting PostScript file without any further console output.
> Error: PDF file is damaged - attempting to reconstruct xref table... > ... etc That's normal, because I edited the file in a text editor, and haven't recomputed object offsets to fix the xref. (Yes, text editor; the file uses only ASCII characters, because the contents stream is ASCII85-encoded.)
OK, first up the 'quick hack' isn't going to work since few PDF consumers like it. Thanks for checking for me Till, I was pretty sure it wasn't going to work, but it was quick to code. SaGS' suggestion of manufacturing new page dictionaries is feasible, but probably about as much work as duplicating the content streams, though it has the obvious advantage of not massively increasing the file size. As I said, I'll look into what it will take to do this or, if there's no other solution, to duplicate the entire content stream. Since I think its several days work either way, its not going to be soon, sorry.
Created attachment 4891 [details] 690355.patch OK, here is a preliminary patch to implement a new switch 'DoNumCopies', this switch is only relevant to the pdfwrite device, and causes it to emit multiple copies of each page. It 'should' keep track of both /#copies and /NumCopies through the course of jobs, so you can have different numbers of copies of each page. I haven't finished testing it yet, but so far it seems OK. Documentation to follow when I check it in. You should *not* use this with any file containing pdfmarks which refer to pages (eg /Dest) as these definitely will not work as expected. Obviously if (like CUPS) you only intend to print/process the file this isn't a concern either. The output PDF file is slightly bigger, as the code currently duplicates the Resources dictionary for each copy of each page (the resources themselves are not duplicated though). I don't think this is a major concern so I'm unlikely to try and address it.
Ken, if you're going to do the plumbing to rewrite the page tree and/or content streams, it may be worth going a little bit further to support some of the reordering an imposition features? After all, half of this is just so they can call the poppler-based impose filter.
Ralph, I'm not rewriting or re-ordering the content streams, all I'm doing is duplicating the individual page dictionaries and adding the extra dictionaries to the Pages tree. This is fairly straightforward, modulo some fiddling to reproduce all the required resource dictionaries. But there's no reordering going on, each page is added to the tree in order, then a number of duplicates added, then we move on to the next page. Re-ordering would be an additional task. Not impossible, but a fair degree harder because we don't know when we start how many pages there are (unless the input is PDF or DSC compliant PostScript). I think we should add that as a different enhancement if we want to do it.
Enhancement added in revision 9615, patch here: http://ghostscript.com/pipermail/gs-cvs/2009-April/009190.html Please note that this differs from the patch in comment #24 above, the flag has changed name from 'DoNumPgaes' to 'DoNumCopies' Till, it would be useful if you could test this with CUPS & Poppler, my testing has been limited to GS and Acrobat, and not very many files with multiple copies set. Because I'd changed code which affects general file writing I spent most time checking that there were no regressions.
On Ralph's comment #25, it seems that the only reason that poppler is part of the workflow is to take the PDF (possibly created by gs from a PS file), and munge it to apply page ordering, N-up and the like. If we want to replace Poppler in the pipeline, the gs rendering which is the recipient of the munged file from Poppler, would perform the page ordering, N-up etc. Page re-ordering is simple (reversal of the 'dopdfpages' loop in pdf_main.ps). Unfortunately, there is a bug against gsnup that uses 'BeginPage', 'EndPage' to perform N-up that makes it not work with PDF (bug 688318).
ray, CUPS has always a page management filter, which does the N-up, selected pages, multiple copies, software collate, ... Before the introduction of the PDF printing workflow this was pstops, completely written by the CUPS developers and not using any renderer libraries like libpoppler or libgs. For the PDF printing workflow we replace pstops by pdftopdf, which does the same on a PDF data stream. Currently it uses libpoppler because Poppler's API allows easy manipulation of pages. To get rid of Poppler a Ghostscript based program hs to replace the current pdftopdf filter.
Thank you very much for the fix. It works great. I have tested it with a real CUPS workflow (see the Ubuntu bug report). For that I have taken the patched Ghostscript 8.64 and I have added "-dDoNumCopies" to the ps2pdf13 command line in the pstopdf CUPS filter. All this will appear in Ubuntu Jaunty.