Bug 689923 - Ghostscript generated pdf twice the size of Acrobat Distiller
Summary: Ghostscript generated pdf twice the size of Acrobat Distiller
Status: NOTIFIED DUPLICATE of bug 688448
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: master
Hardware: All All
: P2 enhancement
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-06-25 11:32 UTC by Marcos H. Woehrmann
Modified: 2011-09-18 21:45 UTC (History)
0 users

See Also:
Customer: 1
Word Size: ---


Attachments
AcrobatDistillerLog.png (43.30 KB, image/png)
2008-06-26 08:37 UTC, Ray Johnston
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2008-06-25 11:32:09 UTC
The customer reports and I've confirmed that Ghostscript generates a PDF from the attached PostScript file 
that is twice the size of the PDF file generated by Acrobat Distiller.  Using -dPDFSETTINGS=/screen has no 
effect on the size of the Ghostscript generated file.

The command line I'm using:

  bin/gs -sDEVICE=pdfwrite -o test.pdf ./input.ps
Comment 1 Marcos H. Woehrmann 2008-06-25 11:33:40 UTC
Created attachment 4147 [details]
input.ps.gz
Comment 2 Marcos H. Woehrmann 2008-06-25 11:35:08 UTC
Created attachment 4148 [details]
acrobatoutput.pdf

Adobe Acrobat Distiller generated output.
Comment 3 Marcos H. Woehrmann 2008-06-25 11:36:36 UTC
Created attachment 4149 [details]
gsoutput.pdf

Ghostscript output.
Comment 4 Ray Johnston 2008-06-26 08:25:40 UTC
I had a look at this one. The summary is that Adobe is doing something quite
advanced in order to reduce the file size. This is a non-trivial improvement/
enhancement for Ghostscript.

Page 1 is about the same between the two, but on Page 2 I found that while the
source PS document contains 825 of 1 row high images to form the 3-D bar chart
at the top of the page. The widths of the lines vary along the top and bottom
(where the edges of the chart slant inwards) and are 1716 wide in the middle.

Ghostscript puts this into the PDF as 825 'Do' operations on separate images.

Adobe, however, manages to reduce this to 164 images with heights varying from
a minimum of 1 line to 651 lines. It seems that they are combining images that
are adjacent that have the same width which reduces the number of images and
allows for more effective compression.

They manage to do this without IdiomRecognition since the sequence that
paints the images isn't in a bound procedure. The image lines are produced by
sequences like:

: 1600 1 24 4800 3200 2 982 1030 F F 3 [ 0 0 0 ] F
X
doNimage
--- ASCII85Encoded RunLengthEncoded data ---
; : 1602 1 24 4806 3204 2 978 1032 F F 3 [ 0 0 0 ] F
X
doNimage
--- ASCII85Encoded RunLengthEncoded data ---

where the procedures used are from the Pscript_Win_Dib_L2 5.0 0 ProcSet that
precedes the images.

On the other hand, Adobe is mapping the color data to an Indexed color space,
and since each image segment has a colorspace associated with it, I'm not
sure that this pays off (since the colorspace Resources need to be stored).


Comment 5 Ray Johnston 2008-06-26 08:37:22 UTC
Created attachment 4159 [details]
AcrobatDistillerLog.png

I forgot to mention that while I was able to eventually generate a PDF using
Distiller 7 from the PS file, it was really frustrating since it would bomb in
the middle frequently during the conversion and running again with the same
settings would then work later. This image is the screen shot showing the
number
of times it failed and only produced a LOG file instead of the PDF.

Just opening this PS file in Acrobat Professional, then saving it as a PDF did
not perform the optimization that Distiller does, producing a file that was
2306Kb compared to the 1688Kb produced by Ghostscript.
Comment 6 Ken Sharp 2008-06-26 09:08:10 UTC
Thanks for the investigation Ray. There's already an open pdfwrite feature
request (688448 I think) which amounts to combining images, and like this one
the images are Indexed. I suspect the two should be duplicates.

Previous investigations have shown that this optimisation is only useful on a
very small number of files (though for those files it makes a *big* difference).
I guess we may implement the feature one day.

I'm not sure if you want to look further, but Acrobat Pro 7 has a setting in
Tools->Advanced->PDF Optimizer, then Discard Objects has a check box for 'Detect
and Merge Image fragments'. This may or may not have any effect....
Comment 7 Henry Stiles 2009-01-13 10:08:13 UTC

*** This bug has been marked as a duplicate of 688448 ***
Comment 8 Marcos H. Woehrmann 2011-09-18 21:45:51 UTC
Changing customer bugs that have been resolved more than a year ago to closed.