Summary: | Possible thread issue. . . | ||
---|---|---|---|
Product: | MuPDF | Reporter: | Pedro Rivera <pedro.rivera.651> |
Component: | mupdf | Assignee: | Tor Andersson <tor.andersson> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | pedro.rivera.651, zeniko |
Priority: | P4 | ||
Version: | unspecified | ||
Hardware: | PC | ||
OS: | Windows 7 | ||
Customer: | Word Size: | --- | |
Attachments: |
JMuPdf Source Files
Updated JMuPdf Sources |
Description
Pedro Rivera
2011-03-04 14:40:35 UTC
Created attachment 7323 [details]
JMuPdf Source Files
MuPDF isn't thread safe, so you'll have to make sure not to share any data at all between two concurrent threads. E.g.:
> fz_device *dev = fz_newdrawdevice(jmupdf->glyphcache, pix);
glyphcache isn't thread safe. Using different glyph caches per thread should fix your crash.
BTW: One of your changes to res_pixmap.c contains an exploitable integer overflow. Make sure not to use JMuPdf to read/convert documents from untrusted sources! (In reply to comment #3) > BTW: One of your changes to res_pixmap.c contains an exploitable integer > overflow. Make sure not to use JMuPdf to read/convert documents from untrusted > sources! Thanks for the quick reply. I will make the changes you mentioned. I will let you know how it goes. Thanks! (In reply to comment #2) > MuPDF isn't thread safe, so you'll have to make sure not to share any data at > all between two concurrent threads. E.g.: > > > fz_device *dev = fz_newdrawdevice(jmupdf->glyphcache, pix); > > glyphcache isn't thread safe. Using different glyph caches per thread should > fix your crash. I went ahead and made that change. It didn't work. Although I found an issue with my xref_array.c program while testing. The way I was handling the "pdf_handle_array" array was causing issues so I changed it to init a fixed size array instead of a "dynamic" one. As documents are closed I then reuse elements. This solved some other threading issues I was having, so now I am getting much better throughput. The only line left that I can't seem to figure out is in pixels.c in the getPixMap() function. Here is the change I made: /* * Draw image */ fz_glyphcache * glyphcache = fz_newglyphcache(); fz_device *dev = fz_newdrawdevice(glyphcache, pix); if ((*env)->MonitorEnter(env, obj) == JNI_OK) { fz_executedisplaylist(page->list, dev, ctm); (*env)->MonitorExit(env, obj); } Again, if I remove the Monitor...() functions the application crashes. I believe that no data sharing is happening between threads when this function is called. I noticed that their was a new version of MuPdf toolkit. So I upgraded and retrofitted my code. I retrofitted res_pixmap.c and pixels.c where fz_clearpixmap() changed to fz_clearpixmapwithcolor(). That's all that changed on upgrading to 0.8. It didn't make a difference though in regards to the threading issue. I am stumped. I did some research regarding accessing global variables in a multi-threaded environment. It seems like it's ok as long as I don't try to update the same element. The "page" (pdf_page) variable is cached in the PDF_HANDLE item and discarded only when another page is accessed or the the handle to a document is closed. I have included another zip file with just the JNI code and the one MuPdf program I modified. The java code didn't change. Again I really appreciate all your help, Pedro Created attachment 7328 [details]
Updated JMuPdf Sources
One of the few global variables in mupdf is the FT_Library (fz_ftlib) in res_font.c. I don't know if this may be the cause of your troubles, but it may be worth looking into. I have less knowledge about some of the other third party libraries, but libjpeg, zlib and jbig2dec should be safe. I can't vouch for openjpeg though, I'd have to study the source. (In reply to comment #7) > One of the few global variables in mupdf is the FT_Library (fz_ftlib) in > res_font.c. > > I don't know if this may be the cause of your troubles, but it may be worth > looking into. > I have less knowledge about some of the other third party libraries, but > libjpeg, zlib and > jbig2dec should be safe. I can't vouch for openjpeg though, I'd have to study > the source. Thanks for pointing that out. I looked at the source and in freetype.h I found this comment where *FT_Library is defined: "For multi-threading applications each thread should have its own FT_Library object." So it seems that FT_Library is not thread safe. I'll study the code res_font.c to see how I can solve this problem. Thanks again, Pedro I looked over res_font.c and because of the way it is written I will have to wrap the fz_executedisplaylist() function around the Monitor...() synchronization functions provide by java jni interface. I can live with this for now but it is a big bottleneck. res_font.c program uses two global variables fz_ftlib and fz_ftlib_refs which are not thread safe. Also I see that fz_ftlib is passed around to other functions possibly updating some of its properties.(?) At this point there is not much that can be done unless res_font.c is modified so that an FT_Library instance is passed in instead of being declared globally. I don't know how to solve this as I am still figuring out mupdf library. In a multi-thread environment one would normally process a unique document handle per thread. Maybe passing in an FT_Library instance to fz_newdrawdevice() could work. One could code something like: fz_glyphcache *glyphcache = fz_newglyphcache(); fz_freetype *ftlib = fz_newfreetype(); fz_device *dev = fz_newdrawdevicewithfreetype(glyphcache, pix, ftlib); fz_executedisplaylist(page->list, dev, ctm); If fz_newdrawdevice() is used then it could use the globally defined ftlib. Just thoughts. Thanks again, Pedro The FT_Library instance is used when creating the FT_Face font objects. As such, the fz_font struct is not thread safe at the moment. You could try putting the FT_Library global in thread local storage, if that is available when using the JNI? (In reply to comment #10) > The FT_Library instance is used when creating the FT_Face font objects. As > such, the fz_font struct is not thread safe at the moment. You could try > putting the FT_Library global in thread local storage, if that is available > when using the JNI? I made the following change to res_font.c : static __thread FT_Library fz_ftlib = nil; static __thread int fz_ftlib_refs = 0; I reran my tests and it worked like a champ. I converted 1,000 documents multi-threaded with no bottle necks. It ran in 40 seconds compared to ~2 minutes. Maybe that simple change should make into the library as a permanent fix(?) I tested this change under Windows, Debian and Ubuntu and it worked very well. Thank you, Pedro (In reply to comment #11) > I made the following change to res_font.c : > > static __thread FT_Library fz_ftlib = nil; > static __thread int fz_ftlib_refs = 0; This wouldn't be compatible with how we currently use MuPDF: We load documents on one thread, render them on a second and search them on a third (all access being synchronized per pdf_xref i.e. per document instance). This leads to loading FreeType on one thread and trying to free it on another. Instead we'd rather need loading FreeType per document/pdf_xref. (In reply to comment #12) > (In reply to comment #11) > > I made the following change to res_font.c : > > > > static __thread FT_Library fz_ftlib = nil; > > static __thread int fz_ftlib_refs = 0; > > This wouldn't be compatible with how we currently use MuPDF: We load documents > on one thread, render them on a second and search them on a third (all access > being synchronized per pdf_xref i.e. per document instance). This leads to > loading FreeType on one thread and trying to free it on another. Instead we'd > rather need loading FreeType per document/pdf_xref. I agree. It would be better to have FT loaded on a per xref basis with the condition that it only loads it if needed when rendering. I have some processes that is purely text extraction. Also, Isn't FreeType only loaded when processing font files while rendering a page? In this case it would only load and unload on your second thread. Unless of course your second thread isn't persistent for future rendering. I apologize if I misunderstand. In my case I have multiple threads rendering pages from different documents, e.g typical server side processing. Therefore using __thread or loading FT at xref level would have the same result under this scenario. In the meantime can I suggest a happy medium? :) #ifdef _FT_LIB_NEED_TLS static __thread FT_Library fz_ftlib = nil; static __thread int fz_ftlib_refs = 0; #else static FT_Library fz_ftlib = nil; static int fz_ftlib_refs = 0; #endif Thanks, Pedro I'm closing this out as I have solved several threading issues by installing various handlers in the MuPDF code. Although I would love to see the library be entirely thread safe, maybe there should be a "wish" list where one could drop legitimate ideas of how to get there. In my research I have identified the following programs as being thread unsafe: 1. draw_edge.c 2. res_font.c 3. base_error.c 4. res_pixmap.c 5. draw_unpack.c It seems to me that making it them thread safe would not be too difficult. Adding a few more items to struct pdf_xref_s could work: /* Manage freetype library */ typedef struct fz_free_type_s fz_free_type; struct fz_free_type_s { FT_Library library; int refs; }; /* Manage warning messages */ typedef struct fz_warn_msg_s fz_warn_msg; struct fz_warn_msg_s { char message[LINE_LEN]; int count; }; /* Manage error messages */ typedef struct fz_error_msg_s fz_error_msg; struct fz_error_msg_s { char message[LINE_COUNT][LINE_LEN]; int count; }; struct pdf_xref_s { ... ... fz_free_type *free_type; fz_warn_msg *warn_msg; fz_error_msg *error_msg; } +++ *** Bug 692162 has been marked as a duplicate of this bug. *** |