Bug 699484

Summary: Some arabic text is not displaying.
Product: MuPDF Reporter: Chetan Prajapat <chetan.prajapat>
Component: mupdfAssignee: MuPDF bugs <mupdf-bugs>
Status: RESOLVED FIXED    
Severity: normal CC: tor.andersson, ztravis
Priority: P4    
Version: master   
Hardware: PC   
OS: Windows 8   
Customer: Word Size: ---
Attachments: Pdf file in which i am getting issue.
truetype font encoding patch

Description Chetan Prajapat 2018-06-22 11:10:50 UTC
Created attachment 15283 [details]
Pdf file in which i am getting issue.

Hello here i am attaching a pdf file which is opening correctly in adobe reader but in mupdf it is not showing some characters of arabic language.
Comment 1 Zachary Travis 2019-04-18 21:07:05 UTC
This looks like a bug in mapping glyph names to glyph ids (described in the pdf spec section 5.5.5). In particular, glyph names like 'uni0642.init' are being parsed as 0x642 rather than being looked up in the TrueType post table. I'll try and provide a patch soon.
Comment 2 Zachary Travis 2019-05-16 20:55:23 UTC
Created attachment 17488 [details]
truetype font encoding patch

Fix an issue with TrueType encodings - as per the PDF spec 5.5.5, to map a glyph name to a glyph, we should first check the Adobe Glyph List, then use the font's post table. Right now the first check is not quite equivalent to checking the official glyph list - in particular `fz_unicode_from_glyph_name` normalizes the glyph name, checks the glyph list, and then attempts to parse the glyph name itself (e.g. "uni0642.init" -> 0x642), finally returning `FZ_REPLACEMENT_CHARACTER`. This means that we never check the font post table. This patch changes the first check to be a strict check of the glyph list, so that we can then check the post table (and finally try the original flexible lookup).
Comment 3 Tor Andersson 2019-05-17 13:14:12 UTC
Thanks, I've implemented something very similar to your patch and it seems to do
the trick and not mess up any other test files.

commit 87023ea0c82c5c9445cde3ef5d712e97e49db7e1
Author: Tor Andersson <tor.andersson@artifex.com>
Date:   Fri May 17 12:43:41 2019 +0200

    Bug 699484: Try mapping via exact unicode, then glyph name, then fuzzy.
    
    Try different approaches in sequence if one fails to find a glyph.
    Encode by glyph name before resorting to stripping '.init' style
    glyph name suffixes.