Bug 692329 - pxlmono generates huge PCL files from certain PDF files
Summary: pxlmono generates huge PCL files from certain PDF files
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PXL Driver (show other bugs)
Version: 9.02
Hardware: PC Linux
: P4 normal
Assignee: Marcos H. Woehrmann
URL:
Keywords: bountiable
Depends on:
Blocks:
 
Reported: 2011-07-07 14:36 UTC by Aurimas Fišeras
Modified: 2016-08-13 14:27 UTC (History)
5 users (show)

See Also:
Customer:
Word Size: ---


Attachments
PDF file that generates huge PCL file (526.73 KB, application/pdf)
2011-07-07 14:36 UTC, Aurimas Fišeras
Details
Patch for ICC bug (845 bytes, patch)
2011-07-21 21:07 UTC, Shailesh Mistry
Details | Diff
Updated patch (918 bytes, patch)
2011-07-26 19:36 UTC, Shailesh Mistry
Details | Diff
Canon iR2230 prints this as an empty page (1.93 MB, application/x-bzip)
2011-08-26 12:04 UTC, Aurimas Fišeras
Details
Canon iR2230 prints this as an empty page too (1.93 MB, application/x-bzip)
2011-08-26 12:04 UTC, Aurimas Fišeras
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Aurimas Fišeras 2011-07-07 14:36:34 UTC
Created attachment 7650 [details]
PDF file that generates huge PCL file

Attached PDF file generates a 56,6 MB PCL file with Ghostscript 9.02 on Ubuntu 11.10.

The same PDF file generates just a 4,5 MB PCL file with Ghostscript 8.71 on Ubuntu 10.04.

Steps to reproduce:
gs -dBATCH -dNOPAUSE -sDEVICE=pxlmono -sOutputFile=orig.pcl orig.pdf
Comment 1 Henry Stiles 2011-07-07 15:56:12 UTC
To be eligible for the bounty the regression must be identified along with the usual fix and testing.
Comment 2 Hin-Tak Leung 2011-07-16 13:07:49 UTC
likely one of Micheal's color-profile related changes, similiar to bug 691880 and 690867. The pdf file has some transparency group and was written by cairo.
Comment 3 Shailesh Mistry 2011-07-21 21:07:44 UTC
Created attachment 7699 [details]
Patch for ICC bug

This patch resolves a bug due to Commit:6a82ae29ea4826048fc923388f4f59823e3a55c6 which merged in the new ICC profile code.

The pdf has a DeviceRGB image which is translated into an ICC profile and passed up but the pcl/pxl can not handle it. This patch looks for incoming ICC profiles and resolves it into the base colour which can be handled at this level.
Comment 4 Henry Stiles 2011-07-22 20:43:25 UTC
The patch looks reasonable to me I copied in Michael to review it also.  

We don't yet have a regression test for pxlmono and pxlcolor, Marcos is going to add it soon to the weekly test, once it is added and Michael reviews I'll commit it.
Comment 5 Hin-Tak Leung 2011-07-24 09:05:56 UTC
(In reply to comment #3)
> Created an attachment (id=7699) [details]
> Patch for ICC bug

Not entirely sure that's correct. Let's see what Michael says, and also what the regression test says.
Comment 6 Hin-Tak Leung 2011-07-24 09:34:09 UTC
FWIW, on my box, r11305 gives  4,768,756 bytes
                 r11306 gives 59,328,047 bytes (merge of icc_work branch).
Basically what I wrote earlier in comment 2.
Comment 7 Michael Vrhel 2011-07-25 18:21:58 UTC
A quick comment on the patch.

We could have a case where gsicc_get_default_type returns 
CIE_A, CIE_ABC, CIE_DEF, or CIE_DEFG due to the color
space being a converted from a CIE or (PDF cal space) to
an ICC color space.  So upon
the return from gs_color_space_get_index, I would just do 
a check for index < gs_color_space_index_DevicePixel and
return true for that case (in which case we have the DeviceGray,
DeviceRGB or DeviceCMYK color spaces) otherwise return false (as
we have a CIE space or an ICC source space.
Comment 8 Shailesh Mistry 2011-07-26 19:36:35 UTC
Created attachment 7714 [details]
Updated patch

This is an update to the original patch based on comment 7
Comment 9 Henry Stiles 2011-07-27 17:30:23 UTC
This new patch (comment #8) needs review from Michael and a test environment from Marcos before it can be released.
Comment 10 Michael Vrhel 2011-07-27 19:00:07 UTC
The patch looks correct to me now.
Comment 11 Till Kamppeter 2011-07-27 19:58:22 UTC
I have done 4 test runs with the attached PDF input file and the attached updated patch, on a GIT snapshot of GS from July 15:

pxlmono on HP LaserJet 3390 (mono)
pxlcolor on HP LaserJet 3390 (mono)
pxlmono on HP Color LaserJet CM3530 MFP (color)
pxlcolor on HP Color LaserJet CM3530 MFP (color)

With pxlmono the size of the file gets the mentioned 4.7 MB and with pxlcolor 10 MB (reasonable, I do not know the size without patch). The printouts ar all correct, naturally color only on the last test, grayscale on the others.

So the patch seems to be OK.
Comment 12 Till Kamppeter 2011-07-27 20:26:05 UTC
I have applied the patch to the GIT repository (trunk) now:

GIT rev bf9dc23000675d406d73d98
Comment 13 Hin-Tak Leung 2011-07-29 10:13:16 UTC
(In reply to comment #12)
> I have applied the patch to the GIT repository (trunk) now:
> 
> GIT rev bf9dc23000675d406d73d98

I replied 'Not entirely sure that's correct' because I already spotted the same problem Michael mentioned in comment 7. i.e. the 1st patch was too optimistic and would likely err on the side of small-size/wrong-color vs large-size/correct color. 

This 2nd patch needs to be regression tested for color-shift. That will be a job for Marcos.
Comment 14 Hin-Tak Leung 2011-07-29 10:26:32 UTC
re-assign to marcos for regression test, I guess.
Comment 15 Henry Stiles 2011-08-03 14:49:35 UTC
We have now included pxlcolor and pxlmono in the weekly test so future changes to this device will be regression tested but this particular change will go forward with only minimal testing from Till and Michael's review.  The bounty can be collected.
Comment 16 Hin-Tak Leung 2011-08-05 02:28:11 UTC
The patch errs on the side of small-size/wrong-color vs large-size/correct color , so regression test from now may just keeps the color wrong/size small for some files.
Comment 17 Aurimas Fišeras 2011-08-19 10:30:10 UTC
Unfortunately, with this change our Canon iR2230 gets a 4,5 MB document but just "prints" an empty page.
Comment 18 Aurimas Fišeras 2011-08-19 12:05:02 UTC
(In reply to comment #17)
> Unfortunately, with this change our Canon iR2230 gets a 4,5 MB document but
> just "prints" an empty page.

Correction, just tested all three gs versions:
Ubuntu Lucid (gs 8.71) - 4,5 MB - empty page in several seconds;
Ubuntu Natty (gs 9.01) - 56,6 MB - page is printed in ~10 minutes;
Ubuntu Oneiric (gs 9.04) - 4,5 MB - empty page in several seconds;
Comment 19 Till Kamppeter 2011-08-22 20:05:42 UTC
Can someone check with GhostPDL whether Ghostscript still produces valid PCL-XL output?

On HP printers Ghostscript still prints correctly.
Comment 20 Henry Stiles 2011-08-22 22:26:02 UTC
(In reply to comment #19)
> Can someone check with GhostPDL whether Ghostscript still produces valid PCL-XL
> output?
> 
> On HP printers Ghostscript still prints correctly.

I don't see anything unusual with the resulting PCL and both the HP 4600 printer and the Artifex PCL interpreter render the same result (matching the input PDF).
Comment 21 Hin-Tak Leung 2011-08-26 00:10:11 UTC
(In reply to comment #18)
> (In reply to comment #17)
> > Unfortunately, with this change our Canon iR2230 gets a 4,5 MB document but
> > just "prints" an empty page.
> 
> Correction, just tested all three gs versions:
> Ubuntu Lucid (gs 8.71) - 4,5 MB - empty page in several seconds;
> Ubuntu Natty (gs 9.01) - 56,6 MB - page is printed in ~10 minutes;
> Ubuntu Oneiric (gs 9.04) - 4,5 MB - empty page in several seconds;

Please attach one or both of the results which does not print - preferably the one from 9.04 but possibly both.
Comment 22 Aurimas Fišeras 2011-08-26 12:04:08 UTC
Created attachment 7836 [details]
Canon iR2230 prints this as an empty page
Comment 23 Aurimas Fišeras 2011-08-26 12:04:31 UTC
Created attachment 7837 [details]
Canon iR2230 prints this as an empty page too
Comment 24 Hin-Tak Leung 2011-08-30 01:57:38 UTC
(In reply to comment #23)
> Created an attachment (id=7837) [details]
(In reply to comment #22)
> Created an attachment (id=7836) [details]

Those two actually does not different more than a co-ordinate offset. I suspect Canon does not implement the full PCL XL spec - i.e. firmware bug you should take it up with Canon; and/or you can poke around to see if there is any configuration to get it to give you an error message of some sort. That said, if you still have the software/etc to generate the original the pdf, maybe you can try doing a 180 rotation, flip the image around, etc to see if you can get a small pclxl file which prints - that would at least suggest what sort of firmware bug it might be.
Comment 25 Aurimas Fišeras 2011-08-30 06:25:18 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > Created an attachment (id=7837) [details] [details]
> (In reply to comment #22)
> > Created an attachment (id=7836) [details] [details]
> 
> Those two actually does not different more than a co-ordinate offset. I suspect
> Canon does not implement the full PCL XL spec - i.e. firmware bug you should
> take it up with Canon;
Canon's Windows driver claims to be PCL5/PCL5e. However, it generates a ~900 KB file from my original PDF file. But I doubt it is a PCL file. Linux's file command detects it as "data".

I'll check with Canon to see if it is possible to upgrade the firmware.

> and/or you can poke around to see if there is any
> configuration to get it to give you an error message of some sort.
Unfortunately, it gives no useful output and I don't see anything to configure. 

> That said,
> if you still have the software/etc to generate the original the pdf, maybe you
> can try doing a 180 rotation, flip the image around, etc to see if you can get
> a small pclxl file which prints - that would at least suggest what sort of
> firmware bug it might be.
Original file is a jpeg image, I'll see what I can do with it.

Thanks for information.
Comment 26 Till Kamppeter 2011-08-30 06:36:56 UTC
The sample files print correctly on my HP printers: HP LaserJet 3390 and HP Color LaserJet CM3530 MFP.
Comment 27 Hin-Tak Leung 2011-09-01 00:10:46 UTC
(In reply to comment #25)
> Canon's Windows driver claims to be PCL5/PCL5e. However, it generates a ~900 KB
> file from my original PDF file. But I doubt it is a PCL file. Linux's file
> command detects it as "data".

Unless you set 'raw' queue in the windows, it is probably EMF (enhanced metafile). There should be a string "EMF" towards the beginning.

> Original file is a jpeg image, I'll see what I can do with it.

The original file is a jpeg image put side-way at about 1/2 the printing resolution. Most of the size-increase was because of jpeg compression artifacts, which get turned into small rectangles of slightly differing shades. But that's besides the point - you probably want to try to turn the *pdf* and see if there is a way to get the printer to print, to see what kind of firmware bug it is. I'd suggest 180 deg rotation or mirroring.
Comment 28 Hin-Tak Leung 2016-08-13 14:27:17 UTC
The change associated with this bug unfortunately turns out to have a issue somewhat along what I was worry about: it does inaccurate colors.

For specific inputs, a combination of non-rectangular-shaped image, clist mode, etc, a non-rectangular shaped image can be broken up into two parts. Rectangular blocks going though the code path directed by this change to "old colors", and the extra triangular parts being processed into little squares like it did before this change, into "new colors".

So you end up having visually, the same image in different color shades.

I think the effect can be seen for rectangular input too, because clist mode breaks up large images into strips, and the beggining strips and ending strips are processed differently; the input for the original report is too simple color-wise to see.

One might need to make up a more complicated color-wise sample, and re-open this bug report.

This code needs to be reverted, and re-directs the data into the icc color path associated with the experimental -diccTransform switch.