Bug 690069 - Ghostscript generates large PCL-XL files
Summary: Ghostscript generates large PCL-XL files
Status: NOTIFIED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Other Driver (show other bugs)
Version: master
Hardware: PC Linux
: P4 enhancement
Assignee: Marcos H. Woehrmann
URL:
Keywords: bountiable
Depends on: 688320
Blocks:
  Show dependency tree
 
Reported: 2008-09-12 07:38 UTC by Marcos H. Woehrmann
Modified: 2011-09-18 21:46 UTC (History)
1 user (show)

See Also:
Customer: 351
Word Size: ---


Attachments
golden_gate.ps (1.04 MB, application/postscript)
2008-09-24 15:16 UTC, Marcos H. Woehrmann
Details
deltarow compression patch (2.77 KB, patch)
2009-10-24 22:27 UTC, Hin-Tak Leung
Details | Diff
updated patch for DeltaRow compression (5.71 KB, patch)
2009-10-26 23:04 UTC, Hin-Tak Leung
Details | Diff
made-up pdf which compresses poorly with RLE but well with DeltaRow. (256.38 KB, application/pdf)
2009-10-27 00:32 UTC, Hin-Tak Leung
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2008-09-12 07:38:15 UTC
The customer reports and I've verified that the attached PostScript files
generate large PCL-XL files when converted by Ghostscript.

There are two issues here:

The file rotate.ps is rotated 90 degrees; this sends blocks to gdevpx.c that are
tall and skinny and the compression fails.

The file image.ps contains a continuous tone image that doesn't compress well
with RLE compression.

Note that there was a third issue, a bug in gdevpx.c that caused it to always
revert to uncompressed images, bug 689732, that has been fixed.  I mention it
because it affects the rotate.ps issue: previously gdevpx.c called
s_RLE_process() with last set to false, which should allow compression to
continue from one raster line to the next and if it worked would solve the
rotate.ps problem.  But calling s_RLE_process() with last set to false is
apparently broken because it fails to compress all images, so my hack was to set
it to true. This causes each raster line to be compressed independently, so it
now works for non-rotated images (see the bug 689732 description for more details).

The command line I'm using:

  bin/gs -sDEVICE=pxlmono -o test.pxl ./rotate.ps
Comment 1 Marcos H. Woehrmann 2008-09-12 07:38:41 UTC
Created attachment 4391 [details]
rotate.ps
Comment 2 Marcos H. Woehrmann 2008-09-12 07:39:32 UTC
Created attachment 4392 [details]
image.ps
Comment 3 Marcos H. Woehrmann 2008-09-24 15:16:56 UTC
Created attachment 4422 [details]
golden_gate.ps

Another file that we do a poor job converting to pxlmono.  In this case we
generate many small rectangles.
Comment 4 Henry Stiles 2009-10-06 10:11:14 UTC
Make bountiable and copy in Hin Tak.  It was suggested we try delta row
compression here instead of RLE.  JPEG is also possible but we note modern hp
drivers aren't producing JPEG.
Comment 5 Hin-Tak Leung 2009-10-06 14:30:35 UTC
Yes, delta row should work much better for attachment 4422 [details] than RLE - I think
there is already some delta row compression code in ghostscript somewhere (maybe
in the tiff* driver?). It should be relatively straight-forward.
Comment 6 Hin-Tak Leung 2009-10-15 01:05:12 UTC
Attachment 4422 [details] now works well with the patch attachment 5488 [details] (bug 690733), and
outputs 222k instead of 3MB; so implementing DeltaRow compression probably isn't
necessary.

I have started looking at DeltaRow compression - there are two slight issues
with it: it requires PCL XL 2.1 (not in 2.0/1.1), and it is all-or-nothing, i.e.
one cannot max-and-match with RLE/nocompression, according to the PXL spec. I
don't know if any printer is sensitive to the PCL XL version, but it probably
needs to be done as a user option.

Still want to implement DeltaRow compression?
Comment 7 Hin-Tak Leung 2009-10-17 16:12:32 UTC
Argh, I knew I saw it somewhere - the PCL XL specs refers to DeltaRow
compression - but the algorithm itself is called mode 3/mode 9 in the PCL (5)
spec, and implemented in base/gdevpcl.c . (in fact RLE in PCL XL is mode 2 in
PCL, I think, but it is 're-done').
Comment 8 Hin-Tak Leung 2009-10-18 02:46:05 UTC
With the rotation patch (attachment 5502 [details]) there are some dramatic improvments - 

output from attachment 4391 [details]: down 388x: 261131850 ->  672883 (261MB -> 0.7MB)
output from attachment 4392 [details]: down 5x:   63348733  -> 12411437 (63MB -> 12MB)

There is little to gain from the first case, but the 2nd case can gain from
DeltaRow compression, I think.

With either patched or unpatched ghostscript, the resulting pxl code from 4392
seems to be broken:
-------------
Warning interpreter exited with error code -986
Flushing to end of job
End of page 2, press <enter> to continue.

PCL XL error
    Subsystem:  KERNEL
    Error:      ExtraData
    Operator:   ReadImage
    Position:   4078747
file position of error = 63348724

    Position:   959
file position of error = 12411428
End of page 3, press <enter> to continue.
---------------

In both cases, the problem is 9 bytes from the end of the pxl data, so it looks
like it is ghostscript silently generating wrong pxl code, which is worrying.
Comment 9 Hin-Tak Leung 2009-10-18 03:02:00 UTC
A little amendment for the last comment - they were of pxlcolor.

For pxlmono, here are the file sizes:
261131542 / 671351 , (388x)
47028769 / 4292701 , (11x)

With pxlmono, pcl6 reads outputs of both patched/unpatched ghostscript alright,
and they look to be the same; and both have visual artefacts. It looks like some
of the images on page two may have transparency masks, which generates broken
pxl streams in pxlcolor and results in visual effects with pxlmono.
Comment 10 Hin-Tak Leung 2009-10-18 18:38:56 UTC
The ExtraData ReadImage issue in Comment 8 is bug 688320 .
	
Comment 11 Hin-Tak Leung 2009-10-24 22:22:56 UTC
With rotation patch, DeltaRow compression seem to do worse than RLE:
       
attachment 4391 [details]:   672883   671351   739965   738433
attachment 4392 [details]: 12411437  4292701 12410256  5876405
attachment 4422 [details]:            (222k?)  609409   226669

Without the rotation patch, it does give a 2x on 4391:
                            color        mono
261131850   261131542   136898647   136875859
63348733   47028769     63747550     48359062

So DeltaRow compression is rarely worth it. The size of 5.8MB vs 4.3Mb (RLE) is
somewhat interesting - RLE has visual artifects, and DeltaRow causes
segmentation fault (new bug, 690844).

I guess I'll throw in jpeg compression as well, to see how it pans out. 
Comment 12 Hin-Tak Leung 2009-10-24 22:27:32 UTC
Created attachment 5541 [details]
deltarow compression patch

Replease RLE with DeltaRow. It does not improve beyond the rotation patch at
all; probably just for completeness. Should not be applied unless/until it is
implemented as an option.
Comment 13 Hin-Tak Leung 2009-10-26 23:04:51 UTC
Created attachment 5552 [details]
updated patch for DeltaRow compression

THis adds a new option -dCompressionMode={1,3} for optionally use DeltaRow
compression, and also updated the documentation.

Compared to previous patch, corrected two issues - it was compressing some
additional garbage added to end of bitmask and 1-pixel high images should not
be deltarow-compressed. Now the result is more sensible - it gives a 12%
advantage for the largest, 5% for the 2nd largest cases by RLE:

attach	 b/c	    b/m      a/c     a/m
4391	 672883  671351   739965  738433
4392   12411467 4292701 10915582 4100753
4422	 792273  222455   609409  226669

Some additional logic may benefit: e.g. switching to RLE for h=2 also expands
slightly for color (11431902) but gains further for mono (3939899) for 4392.

Without the rotation patch, it does give a 2x on 4391 and marginal for 4392:

261131850   261131542	  136760082    136759774 
63348733   47028769	     62241388	  46599270

So it looks like deltarow compression benefits mostly intermediate size
rasters.
Comment 14 Hin-Tak Leung 2009-10-27 00:32:57 UTC
Created attachment 5554 [details]
made-up pdf which compresses poorly with RLE but well with DeltaRow.

a made-up pdf (consisting mainly of a horizontal gradient), which should
compress well with delta row but not RLE.

gs -sDEVICE=pxlcolor -o a.pxl output.pdf
with or without -dCompressMode=3 gives

13591962 -> 859422 (16x)

As expected. (the ratio isn't higher than 16x because the image is rendered in
strips of 67-pixels and so another poorly compressed initial row is sent per
67-pixels).
Comment 15 Hin-Tak Leung 2009-10-27 11:36:19 UTC
Patch committed as r10232 . 

RLE is still the default. -dCompressMode=3 to activate new code.
Comment 16 Hin-Tak Leung 2009-10-28 10:18:54 UTC
patch commented, functionality optional
Comment 17 Marcos H. Woehrmann 2011-09-18 21:46:18 UTC
Changing customer bugs that have been resolved more than a year ago to closed.