Bug 694360 - PDF Print jobs take several minutes to print
Summary: PDF Print jobs take several minutes to print
Status: RESOLVED WORKSFORME
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PS Writer (show other bugs)
Version: 9.05
Hardware: PC Linux
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-06-18 16:51 UTC by sixerjman
Modified: 2013-06-26 19:20 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments
664538.pdf (98.65 KB, application/pdf)
2013-06-19 20:43 UTC, James Cloos
Details
Patch to disable transparency processing for Cairo-produced PDF files (1.27 KB, patch)
2013-06-20 11:27 UTC, Ken Sharp
Details | Diff
cairo-0.12.15 generated pdf with a small ARGB image, an RGB image and black text. (79.22 KB, application/pdf)
2013-06-21 23:35 UTC, James Cloos
Details

Note You need to log in before you can comment on or make changes to this bug.
Description sixerjman 2013-06-18 16:51:06 UTC
CUPS is the printing system.  After initiating a PDF print job from Evince or Adobe Acroread, the system freezes - the mouse and keyboard are unresponsive and
the gnome clock applet stops (time literally stands still).  In my case severe disk thrashing was happening and many processes were being swapped out just so gs
could start.  In the process of doing all this, many times the hung task timeout
(120 seconds) was reached for the tasks being swapped out, and severe performance degradation was in effect.  

Eventually the print job would finish, but it might take anywhere from 5 to 15 minutes to print a 3 page PDF document.  This bug was originally reported against
Debian packages 'cups' and 'cups-filters' (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=682426), but the consensus is that the 'gs' process is the constant in all the cases.

The suggested workaround involved setting the pdftops-renderer to 'pdftops' (from gs), but it isn't clear if that solution solves the problem in every case).
Comment 1 Ken Sharp 2013-06-18 18:40:14 UTC
(In reply to comment #0)
> CUPS is the printing system.  After initiating a PDF print job from Evince
> or Adobe Acroread, 

This bug report is a little puzzling, the component is PDF writer, but as far as I can see you are starting from a PDF, not creating one....


> The suggested workaround involved setting the pdftops-renderer to 'pdftops'
> (from gs), but it isn't clear if that solution solves the problem in every
> case).

I don't see how that would ever solve the problem. If you are printing to a PostScript printer then pdftops would make sense, but not if you are printing to a non-PostScript printer. That would simply move this to a 2-step process where you first convert to PostScript, then render the PostScript. Its possible that might be faster but it seems unlikely. If you are printing to a PostScript printer, then presumably you are already using the PostScript output device so its hard to see how using it explicitly would be different.

Of course if your pipieline somehow involves rendering the file to a btimap then wrapping that up as PostScript and sending that to the pritner then yes, that would be slow, but I can't really believe that is happening.


To be honest, there's not much I can do with this as it stands. Its entirely possible that PDF rendering can be a memory intensive process, and take a long time, particularly if the original PDF file contains large areas of transparency. and the system is limited in memory (or you are printing very large areas or very high resolution).

If Ghostscript is using virtual memory instead of physical memory then it will slow down further, particularly if it starts swapping the GS process out. In this case you should be running Ghostscript with a lower memory limit and using a clist to avoid the requirement for a full page buffer (possibly multiple full page buffers, depending on the presence of transparency).


The first thing I would suggest is that you try the current version (9.07) as I seem to remember there being a bug fix which sounds reminiscent of this problem.

Failing that, we would need an example PDF file, and a *Ghostscript* command line that demonstrates taking a long time to process. You may need to talk to someone on the CUPS team to tell you how to get the Ghostscript command line. I seem to recall that Till Kamppeter posted some instructions on this some time ago, I'll try and find those in the morning.
Comment 2 James Cloos 2013-06-19 20:37:07 UTC
The component should be the ps2writer.

The debian bug report notes that pdf2ps replicates the bug.

On my box, pdf2ps using gs 9.07 takes about 20 seconds whereas poppler’s pdftops takes 1.5 seconds to convert the pdf to ps.

(Cf my upcoming attachment for the pdf from the debian report which I used to test that.)
Comment 3 James Cloos 2013-06-19 20:43:40 UTC
Created attachment 10002 [details]
664538.pdf

The debian bug report mentioned above has this pdf, which looks to have been generated by a web browser, as an example.

cli is simple:

pdf2ps 664538.pdf 664538.ps

pdf2ps takes more than 10 times as much cpu as poppler’s pdftops.

gs also takes much longer to render the resulting ps to screen than it does to render the original pdf.

(For that test I used:

  time gs -dBATCH -dNOPAUSE 664538.pdf
  time gs -dBATCH -dNOPAUSE 664538.ps
)
Comment 4 Ken Sharp 2013-06-20 07:40:59 UTC
Basically the problem is that this is a Cairo produced PDF file, and as a result it includes considerable spurious transparency. That is. the file does not contain any actual transparent areas, but much of it is declared to be transparent, and is then drawn opaque.

Because PostScript has no transparency in its graphics model, we have to render the transparent areas to images, and then include those in the output. The default resolution of ps2write is 720 dpi, which is rather high for most printers, and results in slow rendering and large files. It also results in large amounts of memory being used to hold the transparency buffers.

For example, every page uses a full page transparency group, this means that we have to render the whole page to an image, despite the fact that there is no actual transparent content on the page. (it would be possible to prescan the page to see if transparency is actually used, but this would impact performance for all files, just to benefit files which say they use transparency, but don't. Maybe we should do this for files with a Producer containing 'Cairo')

The last time we looked at this, the reason Poppler was quicker was because it treated the transparent content as opaque. This means you get very fast results with pages that are opaque, and incorrect rendering with pages that are transparent. You can achieve the same behaviour with Ghostscript by setting the -dNOTRANSPARENCY flag, you should see that the file then processes quickly.

Probably the best way to address this is to set the resolution being used by Ghostscript to something more reasonable (eg -r72 for 72 dpi rendering). The resolution of the printer is the maximum which should be used, and you may be able to get acceptable results using as little as 1/4 the resolution.

Also, I believe that the Cairo developers have made changes which reduce the amount of unnecessary full page transparency emitted by Cairo, which may reduce the amount of rendering taking place. (Going further than this and removing for example soft masks which are 100% opaque was deemed to be difficult by the developers). Of course this doesn't help with existing PDF files.....

Fundamentally there isn't a lot we can do about this. You can have correct output, or you can have fast output.

I'm leaving this open for now, I'd be grateful if you would try some of the above suggestions and report back. I'm not sure what we can do with the findings but we'll try and think about it.
Comment 5 Ken Sharp 2013-06-20 11:27:26 UTC
Created attachment 10004 [details]
Patch to disable transparency processing for Cairo-produced PDF files

This patch test the Producer key of the Info dictionary and if it is Cairo turns off transparency processing and emits a warning.

Testing this resulted in about a dozen files in our test suite showing significant differences, so we won't be adopting this as part of the general release.

Fundamentally Ghostscript is doing what its told to, and doing it correctly. The way to solve this is :

1) Get Cairo to only produce transparency in PDF files as actually required
2) Set a lower resolution to ps2write
3) Turn off transparency processing (if you are sure this will be acceptable)

There is an open enhancement to use a display list when rendering large areas of transparency like this, which should reduce the memory required, and stop the swapping which seems to be taking place in the original report. This will be slower than a full page buffer on machines that have the memory but should prevent VM errors and massive swapping on lower spec computers.
Comment 6 Ray Johnston 2013-06-20 15:15:27 UTC
If the extreme impact on the original report was due to virtual memory
swapping, this could be prevented if we automatically used banding when
rendering a transparent image.

The clist logic also will 'skip' transparency for bands that don't actually
use it (I will look to see if it helps on this file).

The automatic use of a clist for banding with transparency is a high
priority enhancement: http://bugs.ghostscript.com/show_bug.cgi?id=689805
Comment 7 James Cloos 2013-06-20 19:37:12 UTC
As a side note, current cairo should not generate oversized trans groups.

I have verified, for example, that the SMask created for an ARGB png on a printed web page has exactly the same dimensions as the image itself.

I’ve also verified that gs-9.07 rasters the whole page in that case, and that replacing the SMask reference with spaces (to keep the offsets correct) causes ps2writer not to raster the page.

(Should I attach the examples of the above which I created here or on a new bug?)
Comment 8 Ken Sharp 2013-06-21 07:10:50 UTC
(In reply to comment #7)
> As a side note, current cairo should not generate oversized trans groups.

I believe I mentioned that somewhere in one of the comments.


> I’ve also verified that gs-9.07 rasters the whole page in that case, and
> that replacing the SMask reference with spaces (to keep the offsets correct)
> causes ps2writer not to raster the page.
> 
> (Should I attach the examples of the above which I created here or on a new
> bug?)

At the moment, neither. There's already an enhancement request open, we'll deal with the whole problem there.

Can you confirm if reducing the resolution resolves the problem ? Has this been relayed to the poster of the Debian bug ? Are they happy ?

None of this is really new so I'd like to close this bug.
Comment 9 James Cloos 2013-06-21 19:34:38 UTC
> I believe I mentioned that somewhere in one of the comments.

Yeah, sorry.  I had to leave the workstation in the middle of reading and by the time I got back I forgot that I hadn’d *finished* reading. ☹

> Can you confirm if reducing the resolution resolves the problem?

I can’t speak for the debian reporter on resolution, but it would not fix things from my POV.  The printer here does noticeably better with text than with rasterized text even at 720dpi, much less something awful like 300.

What I say is:

:; gtime pdf2ps -r300 664538.pdf 664538.gs-r300
3.84user 0.22system 0:04.08elapsed 99%CPU (0avgtext+0avgdata 329216maxresident)k
0inputs+2416outputs (0major+73191minor)pagefaults 0swaps

:; gtime pdf2ps -r720 664538.pdf 664538.gs-r720
18.93user 1.10system 0:20.08elapsed 99%CPU (0avgtext+0avgdata 1635008maxresident)k
0inputs+8184outputs (0major+398811minor)pagefaults 0swaps

As expected, ram use is reduced (from 1635k to 330k).

> Has this been relayed to the poster of the Debian bug ? Are they happy ?

I believe someone found him a workaround he was happy with, which avoided this entirely (if I’m not confusing debian bugs) before it got posted here.

But I just added a note requesting such a test.
Comment 10 Ken Sharp 2013-06-21 19:46:03 UTC
(In reply to comment #9)

> > Has this been relayed to the poster of the Debian bug ? Are they happy ?
> 
> I believe someone found him a workaround he was happy with, which avoided
> this entirely (if I’m not confusing debian bugs) before it got posted here.
> 
> But I just added a note requesting such a test.

OK well I'm going to close this for now. We do have an enhancement open for the memory usage and we'll tackle any problems with full page rasterisation  in that bug. It would be helpful if you;'d post a file (here is fine) which is produced by Cairo but avoids the full page transparency groups, I'll add it to the relevant bug report.
Comment 11 James Cloos 2013-06-21 23:35:39 UTC
Created attachment 10013 [details]
cairo-0.12.15 generated pdf with a small ARGB image, an RGB image and black text.

I grabbed this (random page) as a test because I’m so frequently asked to print documents from this site....

The logo is ARGB; the rest is not.

The pdf was generated by seamonkey using cairo master.

(No review on the contents.  Yet. ☺)
Comment 12 Ken Sharp 2013-06-24 12:39:23 UTC
The attached file (Chicken_Piccata.pdf) still contains a page with a full page transparency group:

10 0 obj
<< /Type /Page
   /Parent 1 0 R
   /MediaBox [ 0 0 612 792 ]
   /Contents 3 0 R
   /Group <<
      /Type /Group
      /S /Transparency
      /I true
      /CS /DeviceRGB
   >>
   /Resources 2 0 R
>>
endobj
Comment 13 Ken Sharp 2013-06-24 12:41:59 UTC
Actually that may not be true, I'm going to consult with our transparency expert, but he's out of the office at the moment.
Comment 14 Ken Sharp 2013-06-24 15:09:02 UTC
OK the file does put the whole page in a transparency group, so indeed the whole page is transparent. However, checking with a hand-edited file which *only* contains small portions of transparency we see the same effect.

As noteed in the comment thread there is an enhancement open to work on the whole problem of rendering transparency for ps2write/pdfwrite, so this should improve where possible at some point in the future. Note that since the Cairo file has a full page transparency group, it won't be possible to improve that.

Checking both files with Poppler we see that it behaves exactly the same, the entire page is rendered to a bitmap for both files.
Comment 15 James Cloos 2013-06-24 19:53:10 UTC
Ye.  I saw that full page group, but replacing the whole group dict with spaces and leaving in the SMask image and the reference thereto also generated a full page raster, whereas removing just the reference to the SMask prevented the raster.  So I jumped to the conclusion that the group wasn’t the issue.

Poppler’s PSOutputDev class (used by pdftops) explicitly forces a full page raster if it finds any transparency or pattern imagemasks at all on the page.

Just as you alluded for gs, it will take a bit of effort to convince PSOutputDev to limit rasters.

On the plus side, pdftocairo, using Poppler/Cairo, does well on this test.  But cairo thinks all the world is sRGB. ☹

I’m working now on support in the OpenPrinting pdftops filter to use pdftocairo when the PDF is limited to sRGB and/or sGray, with fallback to one’s preference of gs or pdftops for anything else.  But avoiding the need for such workarounds is welcome.
Comment 16 Ken Sharp 2013-06-25 07:30:33 UTC
(In reply to comment #15)
> Ye.  I saw that full page group, but replacing the whole group dict with
> spaces and leaving in the SMask image and the reference thereto also
> generated a full page raster, whereas removing just the reference to the
> SMask prevented the raster.  So I jumped to the conclusion that the group
> wasn’t the issue.

It is, and it isn't. Ghostscript does some checking to see if there is 'real' transparency in a PDF file, precisely because some producers stick in spurious transparency.

In the case of this file it *does* use transparency *and* it declares a full page group, so we will always (I think) have to produce a full page raster. 

As you rightly note, even if you remove the group, the presence of real transparency still prompts us to produce a full page raster, just like Poppler. Removing the transparency but leaving the spurious group doesn't, simply because we look for that...


> Just as you alluded for gs, it will take a bit of effort to convince
> PSOutputDev to limit rasters.

Its on the 'todo' list, but it won't be any time soon I suspect. The first task is to use a display list for pdfwrite/ps2write so that we don't have to have multiple full page buffers at the output colour depth and resolution. This leads to VM errors and massive swapping (which is what I believe was the original reporters problem.

After that, we can look to see what (if anything) we can do about only rendering the transparent portions of a PDF file.
Comment 17 James Cloos 2013-06-26 19:20:41 UTC
I read through the pdf ref again (PDFReference17.pdf in particular; PDF32000_2008.pdf’s ugly typography and font choices strain my eyes too much for on-screen reading. ☹)

My take is that the page group is not incorrect, is implied when not specified and primarily matters when the page is embedded in another pdf.

So it is correct that adding or removing a page group does not affect the ‘to raster or not to raster’ question.