Bug 693361 - Regression: many segfaults with multi-threaded rendering starting with 0eae840336f244e5b0d195109e1dd9e104396d87
Summary: Regression: many segfaults with multi-threaded rendering starting with 0eae84...
Status: RESOLVED FIXED
Alias: None
Product: GhostPCL
Classification: Unclassified
Component: Regression PCL (show other bugs)
Version: master
Hardware: PC All
: P1 blocker
Assignee: Ray Johnston
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-10-01 20:33 UTC by Marcos H. Woehrmann
Modified: 2013-01-10 20:30 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments
gdb_traceback_wrong_allocator.log (5.42 KB, application/octet-stream)
2013-01-08 22:23 UTC, Ray Johnston
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2012-10-01 20:33:21 UTC
Starting with 0eae840336f244e5b0d195109e1dd9e104396d87 several hundred of the weekly regression files seg fault with multi-threaded rendering.

Here's a sample command line:

  ./main/obj/pcl6  -o test.ppm  -dMaxBitmap=10000 -sDEVICE=ppmraw -dNumRenderingThreads=4 -r600 ./thesis10.pcl

Hopefully related to the seg fault an error message is produced:

chunk_free_obj failed, object 0x2c47f70 not in chunk at 0x2c3f110, size = 65584
Segmentation fault


commit 0eae840336f244e5b0d195109e1dd9e104396d87
Author: Robin Watts <robin.watts@artifex.com>
Date:   Sat Aug 25 00:38:56 2012 +0100

    Arrange for gscms_create/destroy to be called correctly.
Comment 1 Ray Johnston 2012-10-04 00:56:00 UTC
I am able to get failures in a pcl-debug build by setting breakpoints in the
gsmchunk.c at lines 186, 225, 249 and 418. I get repeatable breakpoints at
line 418 in the main thread.

I think the problem may be that with pcl, the basic chunk allocator is NOT
thread-safe. The threads are created using the mem->thread_safe_memory but
the 'main' thread allocates using the primary chunk allocator. 

Several times I've hit the bp at 418 when the main thread is allocating a
buf_device (from gxclthrd.c:574).

If a thread allocator can't allocate in an existing chunk it owns and has to
allocate another chunk the locking wrapper locks, then calls the underlying
chunk memory allocator, then unlocks and returns. This protects the underlying
chunk allocator from another thread that also would use the same lock, BUT
the main thread DOESN'T USE THE thread_safe_memory and thus is not protected
from allocations (or free actions) by the rendering threads.

I need to think about the best way to fix this, but am thinking that we might
want to have the chunk memory allocator always have a lock, and be able to
turn on and off the use of the lock, effectively making it thread safe.

The multi-threaded rendering would turn this on while rendering, then turn it
off again.

Recall that the reason to avoid a lock on every action is that this is a
slow action on some OS's (Windows in particular) which is why the chunk
allocator was developed along with the multi-threaded rendering.

Other approaches can be described here, or discussed on IRC.

Grabbing the assignment for this bug. Why Robin's change would trigger this
I do not understand. If there are other problems after I fix the above issue,
I may give it back to him. 

BTW, I did notice that gxclthrd.c:673 was allocating the cache for each
thread using the crdev->memory and it should use thread->memory so that
allocations of links, etc. within the cache use that thread's allocator.
This improved things a bit, but was NOT the root cause.
Comment 2 Robin Watts 2012-10-09 09:56:39 UTC
(In reply to comment #1)
> BTW, I did notice that gxclthrd.c:673 was allocating the cache for each
> thread using the crdev->memory and it should use thread->memory so that
> allocations of links, etc. within the cache use that thread's allocator.
> This improved things a bit, but was NOT the root cause.

This small part of the bug was fixed in:

commit b07dca2069a45793b70cb7faf51f53d932144a28
Author: Robin Watts <robin.watts@artifex.com>
Date:   Mon Oct 8 17:31:01 2012 +0100

    Fix multi-threaded rendering use of wrong gs_memory_t.

    When we create rendering threads, we create a gs_memory_t * for each.
    The idea is that every thread should exclusively use its own gs_memory_t
    to avoid conflicts. The gsicc_cache was being setup to use the incorrect
    one.

    This was found (by Ray) when investigating bug 693361. This solves
    this particular error, but does not completely solve the bug.
Comment 3 Ray Johnston 2012-12-04 06:20:42 UTC
The problem is not as I stated in comment #1. The problem is that when the ICC
color management drills down into the lcms module sometimes uses the memory
allocator (ContextID) of the main thread even when higher level functions
in lcms have the ContextID of the correct rendering thread. This results in
the crash (which in a DEBUG build issues the 'not idle' message).

The only solution I can see (short of revamping lcms2 to always pass the
correct allocator (ContextID) is to have the allocator for all ICC/lcms
needs use the mem->thread_safe_memory. While this may be higher overhead, the
ICC color workflow caching should prevent a measurable performance decrease.

I am looking at how to implement this and consulting with Michael Vrhel.
Comment 4 Marcos H. Woehrmann 2012-12-31 02:26:04 UTC
Making this a blocker in anticipation of the February release code freeze.
Comment 5 Ray Johnston 2013-01-08 22:23:39 UTC
Created attachment 9194 [details]
gdb_traceback_wrong_allocator.log

As requested on IRC, this traceback shows that lcms is not using the thread's
allocator, seen in #12 DefaultICCintents (ContextID=0x1aa6d00... but is using
a different allocator that is not threadsafe and is NOT exclusive to the
thread. This allocation comes from #8 in cmsReverseToneCurveEx and calls
#7 cmsBuildTabulatedToneCurve16 (ContextID=0x130b320, ... line 840.

It is getting this allocator from: InCurve->InterpParams->ContextID

The function #11 _cmsReadOutputLUT does not get the ContextID from
DefaultICCintents, so instead picks it up from the hProfile.

As stated above, the only solution I see is to have all of the cms allocations
done with a non_gc_memory threadsafe allocator. Since these allocations are
not frequent, it should not cause a performance hit.
Comment 6 Robin Watts 2013-01-10 11:23:08 UTC
The problem here turns out to be incorrect setup of the buffer device for the thread.

It is true that every profile and link ends up with an allocator record 'pickled into' it. To avoid mixing allocators between different threads therefore, we need to ensure that CMS objects remain localised to a single thread; i.e. if a profile is created in rendering thread 1, it can only ever be used with other profiles created in the same rendering thread.

Fortunately, this is how Michael designed the system to work, and on the whole it seems to.

It transpires though that in one case, the code slips up and allows CMS objects from the main thread to be accessed in rendering threads- the fix is to cure this.

When the 'get_profile' device call is made on a rendering thread, the rendering threads buffer device is called. This device (an image24, in the case I have been debugging) implements get_profile by using gx_forward_get_profile, and therefore calls through to the underlying 'target' device (a ppmraw device).

Unfortunately, the code currently constructs the render threads buffer device from the REAL target device (which uses the main threads allocator). This means that the profiles it returns come from the main threads set of device profiles, not the set of device profiles constructed for the rendering thread in question.

The fix seems to be to me to change the code that constructs the buffer device to use a different target device.

$ git diff
diff --git a/gs/base/gxclthrd.c b/gs/base/gxclthrd.c
index 95147c3..84bc7a9 100644
--- a/gs/base/gxclthrd.c
+++ b/gs/base/gxclthrd.c
@@ -202,7 +202,7 @@ clist_setup_render_threads(gx_device *dev, int y)
         ncdev->icc_table = cdev->icc_table;
         /* create the buf device for this thread, and allocate the semaphores *         if ((code = gdev_create_buf_device(cdev->buf_procs.create_buf_device,
-                                &(thread->bdev), cdev->target,
+                                &(thread->bdev), ndev,
                                 band*crdev->page_band_height, NULL,
                                 thread->memory, clist_get_band_complexity(dev,y             break;

This solves the problem in my testing.
Comment 7 Ray Johnston 2013-01-10 19:27:38 UTC
Other than using 'ncdev' instead of 'ndev', I agree (the Windows compiler is
not as picky as linux, which won't build with 'ndev').
Comment 8 Ray Johnston 2013-01-10 20:30:35 UTC
I committed Robin's change after testing multiple (50) times. Without the
change, it couldn't make it through this file even once (80 pages).

Fixed by commit: 

1bc2a56 Fix bug 693361: Rendering threads unsafe use of main thread allocator.

BTW, the 'ncdev' was my confusion (using ncdev->target) but since this is set
to the 'ndev' anyway, it also fixed it. Robin's patch is preferred.