Summary: | Regression: many segfaults with multi-threaded rendering starting with 0eae840336f244e5b0d195109e1dd9e104396d87 | ||
---|---|---|---|
Product: | GhostPCL | Reporter: | Marcos H. Woehrmann <marcos.woehrmann> |
Component: | Regression PCL | Assignee: | Ray Johnston <ray.johnston> |
Status: | RESOLVED FIXED | ||
Severity: | blocker | CC: | robin.watts |
Priority: | P1 | ||
Version: | master | ||
Hardware: | PC | ||
OS: | All | ||
Customer: | Word Size: | --- | |
Attachments: | gdb_traceback_wrong_allocator.log |
Description
Marcos H. Woehrmann
2012-10-01 20:33:21 UTC
I am able to get failures in a pcl-debug build by setting breakpoints in the gsmchunk.c at lines 186, 225, 249 and 418. I get repeatable breakpoints at line 418 in the main thread. I think the problem may be that with pcl, the basic chunk allocator is NOT thread-safe. The threads are created using the mem->thread_safe_memory but the 'main' thread allocates using the primary chunk allocator. Several times I've hit the bp at 418 when the main thread is allocating a buf_device (from gxclthrd.c:574). If a thread allocator can't allocate in an existing chunk it owns and has to allocate another chunk the locking wrapper locks, then calls the underlying chunk memory allocator, then unlocks and returns. This protects the underlying chunk allocator from another thread that also would use the same lock, BUT the main thread DOESN'T USE THE thread_safe_memory and thus is not protected from allocations (or free actions) by the rendering threads. I need to think about the best way to fix this, but am thinking that we might want to have the chunk memory allocator always have a lock, and be able to turn on and off the use of the lock, effectively making it thread safe. The multi-threaded rendering would turn this on while rendering, then turn it off again. Recall that the reason to avoid a lock on every action is that this is a slow action on some OS's (Windows in particular) which is why the chunk allocator was developed along with the multi-threaded rendering. Other approaches can be described here, or discussed on IRC. Grabbing the assignment for this bug. Why Robin's change would trigger this I do not understand. If there are other problems after I fix the above issue, I may give it back to him. BTW, I did notice that gxclthrd.c:673 was allocating the cache for each thread using the crdev->memory and it should use thread->memory so that allocations of links, etc. within the cache use that thread's allocator. This improved things a bit, but was NOT the root cause. (In reply to comment #1) > BTW, I did notice that gxclthrd.c:673 was allocating the cache for each > thread using the crdev->memory and it should use thread->memory so that > allocations of links, etc. within the cache use that thread's allocator. > This improved things a bit, but was NOT the root cause. This small part of the bug was fixed in: commit b07dca2069a45793b70cb7faf51f53d932144a28 Author: Robin Watts <robin.watts@artifex.com> Date: Mon Oct 8 17:31:01 2012 +0100 Fix multi-threaded rendering use of wrong gs_memory_t. When we create rendering threads, we create a gs_memory_t * for each. The idea is that every thread should exclusively use its own gs_memory_t to avoid conflicts. The gsicc_cache was being setup to use the incorrect one. This was found (by Ray) when investigating bug 693361. This solves this particular error, but does not completely solve the bug. The problem is not as I stated in comment #1. The problem is that when the ICC color management drills down into the lcms module sometimes uses the memory allocator (ContextID) of the main thread even when higher level functions in lcms have the ContextID of the correct rendering thread. This results in the crash (which in a DEBUG build issues the 'not idle' message). The only solution I can see (short of revamping lcms2 to always pass the correct allocator (ContextID) is to have the allocator for all ICC/lcms needs use the mem->thread_safe_memory. While this may be higher overhead, the ICC color workflow caching should prevent a measurable performance decrease. I am looking at how to implement this and consulting with Michael Vrhel. Making this a blocker in anticipation of the February release code freeze. Created attachment 9194 [details]
gdb_traceback_wrong_allocator.log
As requested on IRC, this traceback shows that lcms is not using the thread's
allocator, seen in #12 DefaultICCintents (ContextID=0x1aa6d00... but is using
a different allocator that is not threadsafe and is NOT exclusive to the
thread. This allocation comes from #8 in cmsReverseToneCurveEx and calls
#7 cmsBuildTabulatedToneCurve16 (ContextID=0x130b320, ... line 840.
It is getting this allocator from: InCurve->InterpParams->ContextID
The function #11 _cmsReadOutputLUT does not get the ContextID from
DefaultICCintents, so instead picks it up from the hProfile.
As stated above, the only solution I see is to have all of the cms allocations
done with a non_gc_memory threadsafe allocator. Since these allocations are
not frequent, it should not cause a performance hit.
The problem here turns out to be incorrect setup of the buffer device for the thread. It is true that every profile and link ends up with an allocator record 'pickled into' it. To avoid mixing allocators between different threads therefore, we need to ensure that CMS objects remain localised to a single thread; i.e. if a profile is created in rendering thread 1, it can only ever be used with other profiles created in the same rendering thread. Fortunately, this is how Michael designed the system to work, and on the whole it seems to. It transpires though that in one case, the code slips up and allows CMS objects from the main thread to be accessed in rendering threads- the fix is to cure this. When the 'get_profile' device call is made on a rendering thread, the rendering threads buffer device is called. This device (an image24, in the case I have been debugging) implements get_profile by using gx_forward_get_profile, and therefore calls through to the underlying 'target' device (a ppmraw device). Unfortunately, the code currently constructs the render threads buffer device from the REAL target device (which uses the main threads allocator). This means that the profiles it returns come from the main threads set of device profiles, not the set of device profiles constructed for the rendering thread in question. The fix seems to be to me to change the code that constructs the buffer device to use a different target device. $ git diff diff --git a/gs/base/gxclthrd.c b/gs/base/gxclthrd.c index 95147c3..84bc7a9 100644 --- a/gs/base/gxclthrd.c +++ b/gs/base/gxclthrd.c @@ -202,7 +202,7 @@ clist_setup_render_threads(gx_device *dev, int y) ncdev->icc_table = cdev->icc_table; /* create the buf device for this thread, and allocate the semaphores * if ((code = gdev_create_buf_device(cdev->buf_procs.create_buf_device, - &(thread->bdev), cdev->target, + &(thread->bdev), ndev, band*crdev->page_band_height, NULL, thread->memory, clist_get_band_complexity(dev,y break; This solves the problem in my testing. Other than using 'ncdev' instead of 'ndev', I agree (the Windows compiler is not as picky as linux, which won't build with 'ndev'). I committed Robin's change after testing multiple (50) times. Without the change, it couldn't make it through this file even once (80 pages). Fixed by commit: 1bc2a56 Fix bug 693361: Rendering threads unsafe use of main thread allocator. BTW, the 'ncdev' was my confusion (using ncdev->target) but since this is set to the 'ndev' anyway, it also fixed it. Robin's patch is preferred. |