Bug 690742 - PDF Writer Creating multiple XObjects for Repeated Images
Summary: PDF Writer Creating multiple XObjects for Repeated Images
Status: RESOLVED WONTFIX
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: 8.64
Hardware: PC FreeBSD
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-29 10:42 UTC by Bob McClure
Modified: 2010-08-13 09:50 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Bob McClure 2009-08-29 10:42:03 UTC
Converting postscript with repetitive image calls, extremely large PDF files are
generated and extreme conversion times are experienced. After dumping the output
PDF if appears that ps2pdf is creating multiple XObjects per input EPS file.

Converting a 20,000 page file with 15 unique images took 15 hours to complete,
with an extremely large output PDF file. After removing all the image calls from
the postscript, the conversion took 12 minutes.

For testing I am disabling compression to dump the PDF.

Stripping the PS to demonstrate the issue: (TESTFOO.EPS = 240,684 bytes)

ps2pdf -dCompressPages=false -dMaxInlineImageSize=0 testb.ps testb.pdf

(output size = 19,497 bytes) 
%!PS
217.25  141   moveto
(TESTFOO.EPS) run

adding another image call:

(output size = 36,971 bytes)
%!PS
217.25  141   moveto
(TESTFOO.EPS) run
317.25  141   moveto
(TESTFOO.EPS) run

adding a 3rd call to the EPS image raises the filesize to 54445 bytes.
Comment 1 Ken Sharp 2010-08-13 09:50:50 UTC
(In reply to comment #0)
> Converting postscript with repetitive image calls, extremely large PDF files are
> generated and extreme conversion times are experienced. After dumping the output
> PDF if appears that ps2pdf is creating multiple XObjects per input EPS file.

Realistically, its extremely hard for pdfwrite to know that two images presented separately are the same image. We would need to maintain a hash of every image presented so far, temporarily store each new image, hash it and compare the hash with every previous image in order to determine this.

Frankly that would slow down 'normal' processing quite considerably, especially for people handling large images. While it would benefit your workflow, it would penalise everyone.

Ideally the way to handle this is to construct the images as Form Resources in PostScript. You then use each Form as required. Because each form is unique there is no need to compare the images, and so a Distiller application can reuse the forms.

Unfortunately at present pdfwrite does not properly honour PostScript forms, so this approach would not help you yet. We will be adding this functionality in the future (see #687561 and #691202)

I believe you will find that Adobe Acrobat Distiller behaves in the same fashion with the workflow you describe, though in this case using forms *would* reduce the output size and processing time considerably, as Distiller is able to recognise forms.
 
However we won't address the case where the same image is imply thrown through multiple times.