Bug 691880 - 460k PDF file renders into 89M PCL file by pxlcolor driver
Summary: 460k PDF file renders into 89M PCL file by pxlcolor driver
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PXL Driver (show other bugs)
Version: 8.71
Hardware: PC Linux
: P4 normal
Assignee: Hin-Tak Leung
URL:
Keywords: bountiable
Depends on:
Blocks:
 
Reported: 2011-01-12 19:31 UTC by George Liu
Modified: 2014-03-31 11:48 UTC (History)
3 users (show)

See Also:
Customer:
Word Size: ---


Attachments
This file, when rendered by GS, generate huge PCL file (450.12 KB, application/pdf)
2011-01-12 19:31 UTC, George Liu
Details
pxl JPEG compression mode enhancement patch. (10.35 KB, patch)
2014-03-12 16:45 UTC, Hin-Tak Leung
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description George Liu 2011-01-12 19:31:56 UTC
Created attachment 7121 [details]
This file, when rendered by GS, generate huge PCL file

The following command renders a 460k PDF file into a 89M PCL file

cat original_page3.pdf | gs -q -dBATCH -dPARANOIDSAFER -dNOPAUSE -sDEVICE=pxlmono  -sDEVICE=pxlcolor -r600x600 -sPAPERSIZE=letter -sOutputFile=- -   > rip.pcl
Comment 1 Hin-Tak Leung 2011-01-12 22:53:52 UTC
This is somewhat related to bug 690867 - the file has an image with an unusual color space; before Michael's recent ICC work some of them are embedded as is and stripped of the colorspace info (small but wrongly, like bug 690867) or rendered correctly but with a large file size. With Michael's recent work, they are mostly rendered correctly but with a large size, I think.

--------------------
15 0 obj
[ /Indexed
16 0 R
255
17 0 R
]
endobj

16 0 obj
[ /DeviceN
[ /Black
/PANTONE#20354#20U
]
/DeviceCMYK
18 0 R
19 0 R
]
endobj
19 0 obj
<</Subtype /NChannel
/Process 20 0 R
/Colorants 21 0 R
>>
endobj
21 0 obj
<</PANTONE#20354#20U 22 0 R
>>
endobj
22 0 obj
[ /Separation
/PANTONE#20354#20U
/DeviceCMYK
<</Range [ 0
--------------

This is on my TODO list, but I am not getting to this any time soon so others please feel free to have a go. A hint would be http://bugs.ghostscript.com/show_bug.cgi?id=690867#c5
and utilize Michael's linked icc transforms.

BTW, 8.71 seems to have some JPX-related problem and won't read the input file. The large output file size is with 9.x?
Comment 2 Henry Stiles 2011-02-08 17:38:08 UTC
This file is likely not a good bountiable candidate.  We hope to see some improvement with planned changes for the image code to emit more compact device calls.  Note two devices are specified on the command line with the original problem report.
Comment 3 Till Kamppeter 2011-07-29 09:05:44 UTC
Can be a duplicate of bug 692329. Please check.
Comment 4 Hin-Tak Leung 2011-09-22 19:45:13 UTC
Filed additional Bug 692531 for JPX-related issue.

The file has an image of unusual colorspace within a transparency group scaled up 6x in both dimensions. More compact device calls aren't going to be of much help unless those use PCL/PXL ROPs to deal with transparency.

Hooking up the icc transform plus tuning the compression gives 17MB though (about the 6x6 estimate), which is still a lot better than 89MB.
Comment 5 Hin-Tak Leung 2014-03-11 12:20:14 UTC
Implemented compression mode 2 (jpeg) - and the output is 1.75MB. That's probably agreeable as the original was JPX.  I'll tidy up and fix the remaining problems and post soon.
Comment 6 Hin-Tak Leung 2014-03-12 16:45:28 UTC
Created attachment 10746 [details]
pxl JPEG compression mode enhancement patch.

This patch add Compression Mode 2 (jpeg), and updates the documentation, etc.

1=RLE and  3=DeltaRow are the existing modes. Those two can compress arbitrary depth/colorspace, but jpeg is only suitable for full color images (gray 8-bit is possible but the pxl driver cannot tell between 8-bit gray and 8-bit indexed so it isn't done) so whereas the other two does it wholesale, this one will still revert back to RLE for unsuitable images/bitmaps/masks.

Cluster tested. It is an optional new feature, no difference is expected.

I'll post some detailed numbers.
Comment 7 Hin-Tak Leung 2014-03-12 17:45:24 UTC
pxlcolor sizes:
100624709      default mode 1
  1751773      mode 2
 17298082      mode 3

pxlmono sizes:
  7330837      default (mode 1)
  7330837      mode 2
  4872400      mode 3

The original is largely a JPX image size of 441651 byte at 875 x 1125 (roughly 100 dpi). Rendered to r600 then recompressed with jpeg to about 4x size seems reasonable.

It looks like for pxlmono, requests to compress to jpeg silently switched back to RLE. I'll take a look, it may or may not be a flaw - gray 8-bit and indexed 8-bit aren't distinguished in some part of the pxl imaging code. The latter isn't appropriate for jpeg compression so requests to compress to jpeg for both are silently ignored.
Comment 8 Hin-Tak Leung 2014-03-31 05:08:09 UTC
Patches were committed:

commit 4b44b41c9b6c4a7e5ebf03b6970f9be39548443b
Author: Hin-Tak Leung <hintak@ghostscript.com>
Date:   Wed Mar 12 15:03:58 2014 +0000

    Implements PCL XL Compression Mode 2 (JPEG), and updated documentation and other support files. Bug 694282.

commit 41ab485d48890ecadc3d5f74657b644f9d1a8d7f
Author: Hin-Tak Leung <hintak@ghostscript.com>
Date:   Wed Mar 12 15:16:05 2014 +0000

    pxlmono/pxlcolor: Transform deep (24-bit) images with an ICC transform to emit high-level images. Bug 690867.


There was a regression with the latter and were fixed by
8ae4ee220766aa180150eafeffe4f094f1354f92 (Bug 695103). People thinking of back-porting needs all three.

The new functionalities of compression mode 2 and icc-transform deep images are optional. Patch to remove -diccTransform option and some newly discovered issues are tracked in bug 695124 .
Comment 9 Till Kamppeter 2014-03-31 09:43:26 UTC
Applying all patches mentioned in comment #8, including the one of bug #695124 does not give any size advantage when running

cat original_page3.pdf | gs -q -dBATCH -dPARANOIDSAFER -dNOPAUSE -sDEVICE=pxlcolor -r600x600 -sPAPERSIZE=letter -sOutputFile=- -   > rip.pcl

only adding -dCompressMode=2 gives the mentioned size advantage for the JPEG compression mode, independent whether the ICC transform patches are applied or not. So for the sample file the ICC transform patches seem irrelevant concerning size.

Now my questions:

1. Can one make "-dCompressMode=2" the default? Or does it have any disadvantages? Could you supply a patch to make this the default?

2. Can you attach a sample file where the ICC transform patches reduce the output size significantly?

3. If the ICC transform patches give a size advantage for a certain file, does this happen only with pxlcolor or also with pxlmono?
Comment 10 Hin-Tak Leung 2014-03-31 10:21:42 UTC
(In reply to comment #9)
> 1. Can one make "-dCompressMode=2" the default? Or does it have any
> disadvantages? Could you supply a patch to make this the default?

compression mode 2 and compression mode 3 were introduced in PXL 2.0 and 2.1 respectively. Though the 2.1 spec is nearly 14 years old now, I think we can only assume genuine HP printers implement the full spec. Ricoh, etc are free not to, I guess, and assume only PCL XL 1.1 features, and still claim to "support PCL XL". So you probably should enable them for non-HP printers only on a case-by-case basis.

Also, jpeg compression is lossy and should only be used for large images and where losing some details are acceptable, or the input is jpeg/jpx which are already lossy in the first place.

> 2. Can you attach a sample file where the ICC transform patches reduce the
> output size significantly?

I know of at least two files (mentioned in Bug 690867, for which this change was written for, actually), and T2CharString.pdf (in comparefiles/, mentioned in 695124). Both of them are part of the private_ regression suite. You should be able to get them if you can run the regression suite, I think. but they are private, as you know...

however, you probably can make one up by the information in http://bugs.ghostscript.com/show_bug.cgi?id=690867#c3 - i.e. if you put "/Intent /RelativeColorimetric" in a the dictionary for a image in a pdf file.

> 3. If the ICC transform patches give a size advantage for a certain file,
> does this happen only with pxlcolor or also with pxlmono?

The icc patch is for both. The jpeg patch currently is written to limit to color's - and silent go back to RLE for "unsuitable" images. I haven't figured out how to distinguish 8-bit indexed color images vs gray 8-bit gray, and do gray-jpeg properly.
Comment 11 Hin-Tak Leung 2014-03-31 10:48:23 UTC
(In reply to comment #9)
> 2. Can you attach a sample file where the ICC transform patches reduce the
> output size significantly?

One way to tell what pdf's would benefit is by looking at "mutool into ...". 
The two files I mentioned shows:

Images (6):
	    1 (      8 0 R): [ DCT ] 285x164 8bpc ICC (14 0 R)
	    1 (      8 0 R): [ DCT ] 285x184 8bpc ICC (15 0 R)
	    1 (      8 0 R): [ DCT ] 63x94 8bpc ICC (16 0 R)

and

Images (4):
	    1 (     80 0 R): [ DCT ] 323x375 8bpc Lab (123 0 R)

(note the "ICC" and "Lab" part). Whereas others which are not affected tend to show DevCMYK/DevRGB/DevGray.
Comment 12 Hin-Tak Leung 2014-03-31 11:08:07 UTC
(In reply to comment #9)
> 2. Can you attach a sample file where the ICC transform patches reduce the
> output size significantly?

stupid me... there is a public file http://svn.ghostscript.com/ghostscript/tests/pdf/icc_rendering_intent.pdf , the file which Michael uses for his talks on ICC link tranform!

     size
  2371499 head    pxlcolor -diccTransform 
 15670504 gs 9.14 pxlcolor
   783259 head    pxlmono  -diccTransform
 11641096 gs 9.14 pxlmono 

that's 6x for pxlcolor, and 14x for pxlmono.
Comment 13 Hin-Tak Leung 2014-03-31 11:48:24 UTC
(In reply to comment #12)
> stupid me... there is a public file
> http://svn.ghostscript.com/ghostscript/tests/pdf/icc_rendering_intent.pdf ,
> the file which Michael uses for his talks on ICC link tranform!
> 
>      size
>   2371499 head    pxlcolor -diccTransform 
>  15670504 gs 9.14 pxlcolor
>    783259 head    pxlmono  -diccTransform
>  11641096 gs 9.14 pxlmono 
> 
> that's 6x for pxlcolor, and 14x for pxlmono.

Just for completeness:

     223269 head    pxlcolor -diccTransform -dCompressMode=2
     153819 head    pxlmono -diccTransform -dCompressMode=2 

The original is 1278534 byte, with 4 jpeg images 150k each (so about 600k of images, the rest is font data, etc). The jpeg images in the pxl output is about 50k each, smaller than the original because applying extreme rendering intents means losing details.