Bug 691903

Summary: fz_textextractspan: calculate ascender/descender per glyph
Product: MuPDF Reporter: zeniko
Component: fitzAssignee: MuPDF bugs <mupdf-bugs>
Status: CONFIRMED ---    
Severity: enhancement CC: sebastian.rasmussen, tor.andersson
Priority: P4    
Version: unspecified   
Hardware: PC   
OS: Windows 7   
URL: http://code.google.com/p/sumatrapdf/issues/detail?id=1191http://code.google.com/p/sumatrapdf/issues/detail?id=1191
Customer: Word Size: ---

Description zeniko 2011-01-23 15:08:28 UTC
If a font has few very large glyphs, the bbox for these large glyphs will be used for smaller glyphs as well. E.g. the FiguralBookPlain font in http://www.maps.org/news-letters/v20n2/v20n2-bulletin_full.pdf results in far too large bboxes for most of the first page's text. To reproduce, just search that document for text present on the first page in pdfview.
Comment 1 Tor Andersson 2012-01-12 00:17:42 UTC
You can now get individual bounding boxes for glyphs with fz_bound_glyph.
This isn't exposed in the text device yet, but will be once I update and
merge the text branch.
Comment 2 Tor Andersson 2012-07-20 12:10:02 UTC
Here is an example of using fz_bound_glyph to compute per glyph bboxes. I'm not convinced that this is better, though. Another approach would be to distrust the freetype ascender/descender fields and compute them from some standard glyph instead.

--- a/fitz/dev_text.c
+++ b/fitz/dev_text.c
@@ -415,10 +415,14 @@ fz_text_extract(fz_context *ctx, fz_text_device *dev, fz_text *text, fz_matrix c
                        adv = ftadv / 65536.0f;
                        fz_unlock(ctx, FZ_LOCK_FREETYPE);
 
+#ifdef REAL_GLYPH_BBOXES
+                       rect = fz_bound_glyph(ctx, font, text->items[i].gid, fz_identity);
+#else
                        rect.x0 = 0;
                        rect.y0 = descender;
                        rect.x1 = adv;
                        rect.y1 = ascender;
+#endif
Comment 3 zeniko 2012-07-21 20:26:46 UTC
(In reply to comment #2)
> Here is an example of using fz_bound_glyph to compute per glyph bboxes.

Thanks. An IMO better example can be found in our patchset, though, where we conditionally use fz_bound_glyph with additional fiddling to get better results than either of your two suggestions.