If a font has few very large glyphs, the bbox for these large glyphs will be used for smaller glyphs as well. E.g. the FiguralBookPlain font in http://www.maps.org/news-letters/v20n2/v20n2-bulletin_full.pdf results in far too large bboxes for most of the first page's text. To reproduce, just search that document for text present on the first page in pdfview.
You can now get individual bounding boxes for glyphs with fz_bound_glyph. This isn't exposed in the text device yet, but will be once I update and merge the text branch.
Here is an example of using fz_bound_glyph to compute per glyph bboxes. I'm not convinced that this is better, though. Another approach would be to distrust the freetype ascender/descender fields and compute them from some standard glyph instead. --- a/fitz/dev_text.c +++ b/fitz/dev_text.c @@ -415,10 +415,14 @@ fz_text_extract(fz_context *ctx, fz_text_device *dev, fz_text *text, fz_matrix c adv = ftadv / 65536.0f; fz_unlock(ctx, FZ_LOCK_FREETYPE); +#ifdef REAL_GLYPH_BBOXES + rect = fz_bound_glyph(ctx, font, text->items[i].gid, fz_identity); +#else rect.x0 = 0; rect.y0 = descender; rect.x1 = adv; rect.y1 = ascender; +#endif
(In reply to comment #2) > Here is an example of using fz_bound_glyph to compute per glyph bboxes. Thanks. An IMO better example can be found in our patchset, though, where we conditionally use fz_bound_glyph with additional fiddling to get better results than either of your two suggestions.