Bug 690319

Summary:	pdfwrite with multi-stripe images takes f.o.r.e.v.e.r
Product:	Ghostscript	Reporter:	Mark DeVries <mdevries>
Component:	PDF Writer	Assignee:	Ken Sharp <ken.sharp>
Status:	NOTIFIED DUPLICATE
Severity:	enhancement
Priority:	P2
Version:	8.63
Hardware:	PC
OS:	Windows XP
Customer:	1130	Word Size:	---
Attachments:	onerowperstrip.ps pdfwrite-profile.txt

Description Mark DeVries 2009-03-05 13:14:02 UTC

I have a PS file containing EPS images converted from TIFF by tiff2ps. The
images are Level 2.  Our real-life customer's .ps file is 27 MB.  Converting to
PDF with gs takes about 4 minutes on my Windows PC; Acrobat Distiller 3.0 does
the same task in about 15 seconds.  The troublesome part seems to be 230
graphics that are typically 2400 rows deep, coded as 75 strips 32 rows deep.  In
other words there are 75 repetitions of the image procedure per graphic.  There
are also 947 other graphics that are drawn as 7 strips 16 rows deep, but they
are small and don't eat much time.

I also converted a single TIFF that's 8462 rows deep, drawn at one row per
strip, and created a PDF.  This pathological case took 1:45 with gs on my PC,
and about :05 with Acrobat Distiller.

My command line is: gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite
-sOutputFile=name.pdf -c .setpdfwrite -f name.ps

We've seen similar timings for gs on Red Hat Linux.  (The big file took 42
minutes on Solaris, but that's a *really* old system!)

Comment 1 Mark DeVries 2009-03-05 13:20:47 UTC

Created attachment 4827 [details]
onerowperstrip.ps

This file has one image from tiff2ps with 8642 iterations through the imagemask
procedure, each one row deep.  Converting it to PDF on Windows XP takes 1:45
with gs and :05 with Acrobat Distiller 3.0.

Comment 2 Mark DeVries 2009-03-05 13:29:33 UTC

Created attachment 4828 [details]
pdfwrite-profile.txt

This is a profile of gs distilling a very large file to PDF on Red Hat Linux. 
gs was processing our user's file, which is about 27 MB and includes, among
other things, 230 tiff2ps, Level-2 images which typically draw 2400 rows in
32-row strips.	This process takes about 4 minutes on Linux and Windows XP. 
You'll see that "bytes_compare" get 1.9 billion calls.

This is *not* a profile for the onerowperstrip.ps file I attached.  We do not
yet have user permission to post the real-world file that was used for this
profile.

Comment 3 Ken Sharp 2009-03-06 00:47:09 UTC

This looks very much like an extreme case of the 'image merging' feature of
Acrobat. 

Each row of the image is actually represented in PostScript as an individual
image, resulting in 8462 calls to the image operator. 

Ghostscript reflects this by producing a PDF file with 8642 image XObjects. So
we need 8642 image dictionaries, 8642 entries in the Page XObject declaration
and so on. This means we spend an awful lot of time outputting these
dictionaries, and also checking them.

Acrobat detects the fact that the images are contiguous, and merges them all
into one single image before output. This saves considerable space and checking
of existing objects.

See issues #688448 and #689923 for further details. This is not actually a bug,
its a feature request to do the same consolidation as Acrobat.

Frankly this is a pretty poor way to write images, its slow on any
implementation (watch GS draw this for example). It would be much better to
write all the lines of the image as a single image. Mark I expect you've tried
this, but if you load the image into something like GIMP or Photoshop, and
resave without the multi-stripping, I imagine this works well enough ?

As I believe this is an enhancement not a bug, and a duplicate, I'm marking it
as such, we already have a P2 enhancement request for this.


*** This bug has been marked as a duplicate of 688448 ***

Comment 4 Alex Cherepanov 2009-03-06 11:57:48 UTC

IMHO the problem is the best to fix at the source. tiff2ps is a small open-source
utility that can be easily modified to generate better PostScript.

Comment 5 Ken Sharp 2009-03-06 12:03:49 UTC

Hi Alex, I've exchanged a couple of mails privately with the customer, and it
looks like they will probably have a work around. I suggested using tiffcp to
convert the multi-strip TIFF to a single strip, which works for me.

Comment 6 Richard Nolde 2009-05-11 15:27:49 UTC

I've done some work on the tiff2ps source code and it might be worth your time
to download a current version from the LIBTIFF CVS tree, either 3.9.0 or 4.0 (if
you need BIGTIFF support). I have not modified the Postscript generation code
except to correct some incompatible options and to allow for rotations of 90,
180, and 270 degrees so I cannot comment on the efficiency of it.  However, the
default is to produce Postscript Level 1 which is generally a much larger file
than if you specify level 2 or level 3 and the additional IO may be part of the
problem. Try the -map3 option for Level 3 Postscript using the ImageMask
operator when the image only contains 1 bit per sample.

If you cannot get the current source from CVS, let me know.

Comment 7 Marcos H. Woehrmann 2011-09-18 21:46:03 UTC

Changing customer bugs that have been resolved more than a year ago to closed.