Bug 688448 - PDF created by GS pdfwrite larger than Acrobat 6 and 7
Summary: PDF created by GS pdfwrite larger than Acrobat 6 and 7
Status: NOTIFIED LATER
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: master
Hardware: All All
: P3 enhancement
Assignee: Ken Sharp
URL:
Keywords:
: 689923 690319 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-12-13 10:00 UTC by Ray Johnston
Modified: 2012-04-16 19:13 UTC (History)
4 users (show)

See Also:
Customer: 170
Word Size: ---


Attachments
3page_ad7.pdf (1.45 MB, application/pdf)
2005-12-13 10:12 UTC, Ray Johnston
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ray Johnston 2005-12-13 10:00:30 UTC
The attached PS input file (16Mb) creates a 3Mb PDF with GS where the PDF from
Acrobat 6 or 7 is only 1.5 Mb. This is a 3 page sample of a much larger
document that the customer has.

-dMaxInlineImageSize=0 makes the PDF from Ghostscript slightly larger than
the default and increases the number of XObject images.

The customer expected that the images making up the "background" of the
slides would be shared resources when -dMaxInlineImage was specified, but they
are not. I don't know if this should be happening or not

Also Acrobat 7 seems to merge many of the width 1200 images into a single
image so that the first page has 35 XObject images total. The width 1200
images have Heights of: 9, 78, 10, 5, 794. The GS pdf has 826 images of Height 1.
Comment 1 Ray Johnston 2005-12-13 10:03:46 UTC
Created attachment 1861 [details]
3pages.ps.zip

input PS file
Comment 2 Ray Johnston 2005-12-13 10:12:01 UTC
Created attachment 1862 [details]
3page_ad7.pdf

PDF created by Adobe Acrobat 7 from the 3page.ps file
Comment 3 leonardo 2005-12-14 01:00:42 UTC
A fixed overflow happens in gx_curve_log2_samples while computing 'd'. Source 
data x0 = -2147483648 == 0x80000000 isn't good.
Comment 4 leonardo 2005-12-14 01:01:32 UTC
Please ignore the last comment - it was put into a wrong bug.
Comment 5 Stefan Kemper 2005-12-14 11:10:43 UTC
we are the same size as acrobat 5.
Comment 6 leonardo 2007-02-07 10:52:24 UTC
Bumping the priority for customer bug.
Comment 7 leonardo 2007-08-29 19:53:40 UTC
Passing to Ken since he handles pdfwrite from now.
Comment 8 Ken Sharp 2007-09-07 01:52:41 UTC
Hmm. I could do with some input from more experienced GS developers on this one.

1) I see no reason why MaxInlineImageSize=0 would result in images being
'shared'. This is only possible when the images are identical. There is no easy
way (in PostScript) to know that ,unless the images are from (for example) a
Form or some other identifiably identical source (we could compare all images
sample by sample, but that would be sloooow...). However, in this case, I *do*
see the images being 'shared'. For example, in the content stream of the first page:

W* n
q 0 6000 -5 -0 235 510.921 cm
/R8 Do
Q
q 0 6000 -5 -0 240 510.921 cm
/R8 Do
Q
q 0 6000 -5 -0 245 510.921 cm
/R8 Do
Q
q 0 6000 -5 -0 250 510.921 cm
/R8 Do
Q
q 0 6000 -5 -0 255 510.921 cm
/R9 Do
Q

Note the reuse of the first image, R8.

2) Merging images. This is something I was involved with in my previous
incarnation. I don't recommend it. It significantly harms performance, for what
are usually tiny gains on any sensibly constructed file. This file doesn't look
bad enough to gain. In fact even the customer that forced the adoption in my
previous life didn't save space, and their job had each scan line as a separate
image.

3) So, why is the job larger ? Simple, all the images outpuit by GS are RGB, and
are therefore 3 bytes per image sample. All (or at least, a lot) of the images
in the Acrobat output are /Indexed /RGB , using one byte per image sample. THe
job is mostly composed of images, save 1/3 of each image and the output is
indeed much smaller.

It looks to me like there is no 'auto-indexification' of images in pdfwrite.
This *is* a useful feature, it does make processing a little slower, but many
jobs (especially from MS Office applications) benefit from much smaller output.
This file was a PowerPoint presentation ;-)

Comment 9 Henry Stiles 2009-01-13 10:08:14 UTC
*** Bug 689923 has been marked as a duplicate of this bug. ***
Comment 10 Ken Sharp 2009-03-06 00:47:10 UTC
*** Bug 690319 has been marked as a duplicate of this bug. ***
Comment 11 Shailesh Mistry 2011-07-16 12:42:10 UTC
Enhancement still missing in Ghostscript 9.03