Bug 692234 - Floating point exception with <8 bit/component CMYK devices and pspcl6
Summary: Floating point exception with <8 bit/component CMYK devices and pspcl6
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Color (show other bugs)
Version: master
Hardware: PC Linux
: P4 normal
Assignee: Henry Stiles
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-05-25 20:00 UTC by James Cloos
Modified: 2011-07-20 16:25 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description James Cloos 2011-05-25 20:00:41 UTC
GhostPDL Git master compiled with ./configure;make generates:

:; ./language_switch/obj/pspcl6 -sDEVICE=x11cmyk gs/examples/colorcir.ps 
Floating point exception

The backtrace is:

(gdb) where
#0  0x00000000004ea476 in gx_render_device_DeviceN ()
#1  0x000000000070a875 in cmap_cmyk_direct ()
#2  0x0000000000709393 in gx_remap_concrete_DCMYK ()
#3  0x00000000005285bf in gx_remap_concrete_ICC ()
#4  0x0000000000528774 in gx_remap_ICC ()
#5  0x0000000000708cdc in gx_remap_color ()
#6  0x00000000006f64d8 in gs_fillpage ()
#7  0x00000000006f652a in gs_erasepage ()
#8  0x0000000000407826 in ps_impl_set_device ()
#9  0x000000000079cdd3 in pl_main_universe_select ()
#10 0x000000000079e8e7 in pl_main_aux ()
#11 0x00007ffff67cbebd in __libc_start_main (main=0x4072c0 <main>, argc=3, 
    ubp_av=0x7fffffffbdf8, init=<value optimized out>, fini=<value optimized out>, 
    rtld_fini=<value optimized out>, stack_end=0x7fffffffbde8) at libc-start.c:226
#12 0x00000000004072f1 in _start () at ../sysdeps/x86_64/elf/start.S:113

I’m compiling a debug build now to get a better backtrace.
Comment 1 James Cloos 2011-05-25 20:48:58 UTC
The debug build does not generate an FPE.

Comparing objdump -d output vs the backtrace, the instruction which generates the FPE in the release build is:

 466:   48 f7 f6                div    %rsi

so it is a interger divide by zero error.

And I see why.

In gx_render_device_DeviceN(),

         unsigned long hsize = pdht ?
                (unsigned) pdht->components[i].corder.num_levels
                : 1;

it looks like only the first element of the pdht->components[] array is initialized, even though num_colors == 4.

Eg, (gdb) p pdht->components[1]
$49 = {corder = {params = {M = 16280, N = 0, R = 48, M1 = 0, N1 = 1120, R1 = 151, 
      C = 140733200730880, D = 0, D1 = 0, W = 0, W1 = 116788612, S = 32767}, wse = 0x1bbaa88, 
    wts = 0xffffffff00200b00, width = 43792, height = 443, raster = 0, shift = 0, 
    orig_height = 3072, orig_shift = 48, full_height = 176, num_levels = 11693984, 
    num_bits = 0, procs = 0x7fff00400c00, data_memory = 0x1b39590, levels = 0x7fff00500c00, 
    bit_data = 0x2, cache = 0x7fff00600c00, transfer = 0x1bbb048, screen_params = {matrix = {
        xx = 1.0289523e-38, xy = 4.59163468e-41, yx = 0, yy = 0, tx = 9.25547805e-35, 
        ty = 4.59163468e-41}, max_size = 29076104}, 
    threshold = 0x700c00 "\362\017X\301\362H\017,\300\211\302H\213\205x\374\377\377\213M\354Hc\311H\203\301\070\211T\210\f\203E\354\001\213U\354H\213\205x\374\377\377\213\200\214", 
    threshold_inverts = 28546448}, comp_number = 8391680, cname = -1}

whereas: (gdb) p pdht->components[0]
$48 = {corder = {params = {M = 16, N = 0, R = 1, M1 = 16, N1 = 0, R1 = 1, C = 256, D = 16, 
      D1 = 16, W = 16, W1 = 16, S = 0}, wse = 0x0, wts = 0x0, width = 16, height = 16, 
    raster = 8, shift = 0, orig_height = 16, orig_shift = 0, full_height = 16, 
    num_levels = 256, num_bits = 256, procs = 0xb2c080, data_memory = 0x1b365a0, 
    levels = 0x1eeee60, bit_data = 0x1eef270, cache = 0x1eefa80, transfer = 0x0, 
    screen_params = {matrix = {xx = 1, xy = 0, yx = 0, yy = 1, tx = 0, ty = 0}, 
      max_size = 1048576}, threshold = 0x0, threshold_inverts = 28759104}, comp_number = 0, 
  cname = 0}

My guess is that, when hsize is set via:

        unsigned long hsize = pdht ?
                (unsigned) pdht->components[i].corder.num_levels
                : 1;

that, in the release build, pdht->components[1].corder.num_levels
happens to be 0, thus causing a divide by zero on the line:

        int_color[i] = shade / hsize;
Comment 2 James Cloos 2011-05-25 22:30:11 UTC
git bisect tells me that the first commit where x11cmyk is unable to render the colorcircle example file is:

1787ce3393956701e6241b8efc6f575887c3f5c1 is the first bad commit
commit 1787ce3393956701e6241b8efc6f575887c3f5c1
Author: Michael Vrhel <michael.vrhel@artifex.com>
Date:   Sun May 15 15:50:16 2011 -0700

    Change in device ICC profile handling
    
    This is the major portion of the code needed to achieve object dependent
    color management.  This fixes the problems that existed in the
    previous code with the device parameters and introduces an array of
    ICC profiles in the device structure.  The code was cluster pushed and
    showed some very minor differences in a couple files but they appear to be
    OK with bmpcmp.  I still need to do further testing to verify that all the
    functionality is correct (e.g. make sure setting the text profile properly
    affects the text only).  In addition, the rendering intent options need to be
    implemented.
    
    I also need to check that nothing was broken with respect to MT
    rendering and some of the devices that are not tested with cluster
    pushing (e.g. the display device and the x11alpha device).

:040000 040000 d250497a09f1 6ca798ca0d0a M	gs
:040000 040000 0dda664aefc4 9a49c5f70bc5 M	pcl
:040000 040000 77cc49dbe4a9 8947f40a4e96 M	psi
:040000 040000 f63df13e7a87 979b2c16ed04 M	pxl
:040000 040000 a179b2b826f0 bc6fb996f003 M	svg
:040000 040000 405560d6dec4 cef02a469de1 M	xps

(with the hashes compressed to prevent word wrap).
Comment 3 Michael Vrhel 2011-05-25 22:56:09 UTC
I was worried about this device when I made this change.  I will dig into it.
Comment 4 James Cloos 2011-05-26 01:01:18 UTC
I just noticed that the pcxcmyk and pamcmyk4 devices also hit this bug, so you do not need an X11 server to test.
Comment 5 Michael Vrhel 2011-06-24 04:08:03 UTC
I tested gs with pamcmyk4 and it ran just fine.  I wonder if it is a language switch issue.
Comment 6 Henry Stiles 2011-06-24 07:44:43 UTC
(In reply to comment #5)
> I tested gs with pamcmyk4 and it ran just fine.  I wonder if it is a language
> switch issue.

x11cmyk fail for both chris (linux) and I (mac) with just the gs build.  The language switch also fails.
Comment 7 James Cloos 2011-06-24 13:44:29 UTC
I’ve confirmed my guess at the end of comment #1 by compiling the release build with make 'CC=gcc -ggdb3' and running it in gdb with a breakpoint at gx_render_device_DeviceN().

The bug is that gx_render_device_DeviceN() presumes that the pdht->components[] array will have as many members as the device has inks, but pdht->num_comp is always 1.

Were one to add an assert() to confirm that pdht->num_comp >= dev->color_info.num_components then I expect it would fail whenever the output device is cmyk and has fewer than eight bits per ink.

In a gcc debug build the pdht->components[i].corder.num_levels (i>0) just happen to be positive, albeit huge.  With a ggdb3 release build I’ve found that usually only pdht->components[3].corder.num_levels happens to be 0.
Comment 8 Henry Stiles 2011-06-28 19:02:29 UTC
Should be fixed with Michael's f4e1d4b280f6e6ff.
Comment 9 James Cloos 2011-06-28 22:58:52 UTC
No, it does not address this bug.

It is easy to get false negatives when trying to reproduce this one.  The best way to check is to run:

gdb --args ./language_switch/debugobj/pspcl6 -sDEVICE=x11cmyk gs/examples/colorcir.ps

set a breakpoint with:

(gdb) b gx_render_device_DeviceN

run the test:

(gdb) run

when you hit the breakpoint, confirm that the device has four colours:

(gdb) p dev->color_info.num_components
$1 = 4

and the confirm that the pdht has only one component:

(gdb) p pdht->num_comp
$2 = 1

and the code still does:

    int num_colors = dev->color_info.num_components;

    for (i = 0; i < num_colors; i++) {
        unsigned long hsize = pdht ?
                (unsigned) pdht->components[i].corder.num_levels
                : 1;

and therefore still dereferences array elements which are not defined.

Should pdht->components[i].corder.num_levels, i>0 happen to hit memory which has been initialized to non-zero values there is no division by zero, but that is just blind luck.

The code still falsely assumes that there will be (at least) as many entries in the pdht->components[] array (which is counted by pdht->num_comp) as the device has colours.  But pdht always seems to have a single component.

Any cmyk device with less than eight bits per colour will do; the bug is not limited to the x11cmyk devices.  (The eight bit per colour devices do not call this function, nor do RGB devices.  I suspect a one bit per plate version of tiffsep or tiffsep1 would also hit this, presumably with more than just four colours if the file uses a DeviceN space.)
Comment 10 Henry Stiles 2011-06-29 00:58:43 UTC
Sorry we'll look again I just did a quick check and thought it was fixed.

(In reply to comment #9)
> No, it does not address this bug.
> 
> It is easy to get false negatives when trying to reproduce this one.  The best
> way to check is to run:
> 
> gdb --args ./language_switch/debugobj/pspcl6 -sDEVICE=x11cmyk
> gs/examples/colorcir.ps
> 
> set a breakpoint with:
> 
> (gdb) b gx_render_device_DeviceN
> 
> run the test:
> 
> (gdb) run
> 
> when you hit the breakpoint, confirm that the device has four colours:
> 
> (gdb) p dev->color_info.num_components
> $1 = 4
> 
> and the confirm that the pdht has only one component:
> 
> (gdb) p pdht->num_comp
> $2 = 1
> 
> and the code still does:
> 
>     int num_colors = dev->color_info.num_components;
> 
>     for (i = 0; i < num_colors; i++) {
>         unsigned long hsize = pdht ?
>                 (unsigned) pdht->components[i].corder.num_levels
>                 : 1;
> 
> and therefore still dereferences array elements which are not defined.
> 
> Should pdht->components[i].corder.num_levels, i>0 happen to hit memory which
> has been initialized to non-zero values there is no division by zero, but that
> is just blind luck.
> 
> The code still falsely assumes that there will be (at least) as many entries in
> the pdht->components[] array (which is counted by pdht->num_comp) as the device
> has colours.  But pdht always seems to have a single component.
> 
> Any cmyk device with less than eight bits per colour will do; the bug is not
> limited to the x11cmyk devices.  (The eight bit per colour devices do not call
> this function, nor do RGB devices.  I suspect a one bit per plate version of
> tiffsep or tiffsep1 would also hit this, presumably with more than just four
> colours if the file uses a DeviceN space.)
Comment 11 James Cloos 2011-06-29 13:11:13 UTC
I just tried bisecting again, using gdb to get an accurate test.

The last time that function was touched was with commit c8009e1f, which is before the pdl code was added, which of course requires that one test gs instead of pspcl6 as I had been doing.

It turns out that when running gs, that function is called with pdht=NULL, which avoids this bug.

I haven’t determined /why/ pspcl6 passes a (point to a) gx_device_halftone struct and gs passes NULL, but due to that this bug is ghostpdl-specific.

Quick tests (with tiger.*) show that pcl6, gxps and gsvg also pass NULL pdht's
and that pspcl6 also does when rendering tiger.px3, tiger.svg and tiger.xps.

So the bug is limited to pspcl6 when rendering postscript and pdf files to CMYK (and probably DeviceN) devices which have fewer than eight bits per component.

(I cannot guess, based on that, which Product/Component you’d prefer for this.)
Comment 12 James Cloos 2011-06-29 13:15:46 UTC
I think I found why the components[] array only has one entry.

I added a break point at gx_imager_dev_ht_install().

When pspcl6 -sDEVICE=x11cmyk calls gx_imager_dev_ht_install(), the dev arg is DeviceGray, not DeviceCMYK:

(gdb) p dev->color_info
$6 = {max_components = 1, num_components = 1, polarity = GX_CINFO_POLARITY_ADDITIVE, 
  depth = 1 '\001', gray_index = 0 '\000', max_gray = 1, max_color = 0, dither_grays = 2, 
  dither_colors = 1, anti_alias = {text_bits = 1, graphics_bits = 1}, 
  separable_and_linear = GX_CINFO_SEP_LIN, comp_shift = '\000' <repeats 63 times>, 
  comp_bits = "\001", '\000' <repeats 62 times>, comp_mask = {1, 0 <repeats 63 times>}, 
  cm_name = 0xb555cd "DeviceGray", opmode = GX_CINFO_OPMODE_NOT, process_comps = 0, 
  black_component = 0}

so the language_switch build appears to set its device, based on -sDEVICE, too late.
Comment 13 Henry Stiles 2011-07-19 19:56:42 UTC
(In reply to comment #12)
> I think I found why the components[] array only has one entry.
> 
> I added a break point at gx_imager_dev_ht_install().
> 
> When pspcl6 -sDEVICE=x11cmyk calls gx_imager_dev_ht_install(), the dev arg is
> DeviceGray, not DeviceCMYK:
> 
> (gdb) p dev->color_info
> $6 = {max_components = 1, num_components = 1, polarity =
> GX_CINFO_POLARITY_ADDITIVE, 
>   depth = 1 '\001', gray_index = 0 '\000', max_gray = 1, max_color = 0,
> dither_grays = 2, 
>   dither_colors = 1, anti_alias = {text_bits = 1, graphics_bits = 1}, 
>   separable_and_linear = GX_CINFO_SEP_LIN, comp_shift = '\000' <repeats 63
> times>, 
>   comp_bits = "\001", '\000' <repeats 62 times>, comp_mask = {1, 0 <repeats 63
> times>}, 
>   cm_name = 0xb555cd "DeviceGray", opmode = GX_CINFO_OPMODE_NOT, process_comps
> = 0, 
>   black_component = 0}
> 
> so the language_switch build appears to set its device, based on -sDEVICE, too
> late.

Revisiting this one sorry to leave it open so long X11 is not customer critical and doesn't get attention like other devices.

The first call to gx_imager_dev_ht_install is for the null device which should be gray.  The problem I see with this file is the new color management does not work properly with Xfonts.  This patch should fix the problem it does for me, let us know if it doesn't work for you.


diff --git a/gs/base/gdevxres.c b/gs/base/gdevxres.c
index 3d4a5ed..e78abc7 100644
--- a/gs/base/gdevxres.c
+++ b/gs/base/gdevxres.c
@@ -108,8 +108,8 @@ ZapfChancery-MediumItalic:-Adobe-ITC Zapf Chancery-Medium-I-Normal--"),
          "Symbol: -Adobe-Symbol-Medium-R-Normal--"),
 
     rbool("useBackingPixmap", "UseBackingPixmap", useBackingPixmap, True),
-    rbool("useExternalFonts", "UseExternalFonts", useXFonts, True),
-    rbool("useFontExtensions", "UseFontExtensions", useFontExtensions, True),
+    rbool("useExternalFonts", "UseExternalFonts", useXFonts, False),
+    rbool("useFontExtensions", "UseFontExtensions", useFontExtensions, False),
     rbool("useScalableFonts", "UseScalableFonts", useScalableFonts, True),
     rbool("useXPutImage", "UseXPutImage", useXPutImage, True),
     rbool("useXSetTile", "UseXSetTile", useXSetTile, True),




I don't know if we want to dig into this because we should be deprecating Xfonts this release.  I'll assign it to Chris and let him decide what's best.  Chris assign it back to me if more analysis is needed.
Comment 14 James Cloos 2011-07-19 21:32:42 UTC
That patch did not stop the div0 error.

With this .gdbinit in the ghostpdl top level dir:

break gx_render_device_DeviceN
break gx_imager_dev_ht_install
run

running:

gdb --args ./language_switch/obj/pspcl6 -sDEVICE=x11cmyk gs/examples/text_graphic_image.pdf

first breaks at gx_imager_dev_ht_install() with the null DEVICE.  continuing breaks at gx_render_device_DeviceN() with dev->color_info.num_components == 4,
but with pdht->num_comp == pdht->num_dev_comp == 1.  So the bug is still there.

Note that this only occurs (in my testing) with the language switch build.  gs(1) avoids calling gx_imager_dev_ht_install() with the null DEVICE.

Note also that this bug is not limited to x11cmyk.  It also hits the new cmyk plane device plank (planc is OK because it uses 8 bits per plane) and every other one, two or four bit/component cmyk device.

I changed the summary to point out those.
Comment 15 Henry Stiles 2011-07-19 22:57:39 UTC
Ah right that gs_erasepage is certainly wrong, but I need to do more testing.

diff --git a/psi/psitop.c b/psi/psitop.c
index 41b228e..31346f9 100644
--- a/psi/psitop.c
+++ b/psi/psitop.c
@@ -307,8 +307,6 @@ ps_impl_set_device(
         return code;
     /* Set the device into the gstate */
     code = gs_setdevice_no_erase(pgs, device);
-    if (code >= 0 )
-        code = gs_erasepage(pgs);
 
     if (code < 0)
         return code;


And it seems that particular file (gs/examples/text_graphic_image.pdf) has another problem, but your original colorcir.ps should complete without errors.  I'll have to split out the Xfonts problem as a separate bug.
Comment 16 Henry Stiles 2011-07-20 04:04:52 UTC
http://git.ghostscript.com/?p=ghostpdl.git;a=commit;h=a46123967 should fix the original problem but I'm not sure if other problems are lurking, the language switch build is not tested thoroughly.
Comment 17 Henry Stiles 2011-07-20 15:54:40 UTC
Closing as fixed see comment 13 ... feeling lucky today :-)
Comment 18 Henry Stiles 2011-07-20 15:59:11 UTC
(In reply to comment #17)
> Closing as fixed see comment 13 ... feeling lucky today :-)

that should say "comment 16".
Comment 19 James Cloos 2011-07-20 16:17:45 UTC
> Ah right that gs_erasepage is certainly wrong

Is that all it was?  [SIGH]  I should have picked up on that.

I tested that patch on several ps and pdf files and could not trigger any divide by zeros.

> the language switch build is not tested thoroughly

Oh.  I thought that build was customer-driven?  Especially the one and four bit CMYK output (all of which suffered from this crash).

pspcl6 (and gs 9.02) still SEGV on text_graphic_image.pdf and text_graph_image_cmyk_rgb.pdf when using any x11cmyk device, but that looks to be a different bug.
Comment 20 Henry Stiles 2011-07-20 16:25:21 UTC
> Oh.  I thought that build was customer-driven?  Especially the one and four bit
> CMYK output (all of which suffered from this crash).
>

We still don't have a customer that uses postscript and another language - like pcl, xps etc together, this configuration will get a lot more attention when that happens.
 
> pspcl6 (and gs 9.02) still SEGV on text_graphic_image.pdf and
> text_graph_image_cmyk_rgb.pdf when using any x11cmyk device, but that looks to
> be a different bug.

I opened:

http://bugs.ghostscript.com/show_bug.cgi?id=692360

You can add yourself to the CC: list if you want to track it.