699484 – Some arabic text is not displaying.

Bug 699484 - Some arabic text is not displaying.

Summary: Some arabic text is not displaying.

Status:	RESOLVED FIXED

Alias:	None

Product:	MuPDF
Classification:	Unclassified
Component:	mupdf (show other bugs)
Version:	master
Hardware:	PC Windows 8

Importance:	P4 normal
Assignee:	MuPDF bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2018-06-22 11:10 UTC by Chetan Prajapat
Modified:	2019-05-17 13:14 UTC (History)
CC List:	2 users (show)

See Also:
Customer:
Word Size:	---

Attachments
Pdf file in which i am getting issue. (588.07 KB, application/pdf) 2018-06-22 11:10 UTC, Chetan Prajapat	Details
truetype font encoding patch (1.97 KB, patch) 2019-05-16 20:55 UTC, Zachary Travis	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Chetan Prajapat 2018-06-22 11:10:50 UTC

Created attachment 15283 [details]
Pdf file in which i am getting issue.

Hello here i am attaching a pdf file which is opening correctly in adobe reader but in mupdf it is not showing some characters of arabic language.

Comment 1 Zachary Travis 2019-04-18 21:07:05 UTC

This looks like a bug in mapping glyph names to glyph ids (described in the pdf spec section 5.5.5). In particular, glyph names like 'uni0642.init' are being parsed as 0x642 rather than being looked up in the TrueType post table. I'll try and provide a patch soon.

Comment 2 Zachary Travis 2019-05-16 20:55:23 UTC

Created attachment 17488 [details]
truetype font encoding patch

Fix an issue with TrueType encodings - as per the PDF spec 5.5.5, to map a glyph name to a glyph, we should first check the Adobe Glyph List, then use the font's post table. Right now the first check is not quite equivalent to checking the official glyph list - in particular `fz_unicode_from_glyph_name` normalizes the glyph name, checks the glyph list, and then attempts to parse the glyph name itself (e.g. "uni0642.init" -> 0x642), finally returning `FZ_REPLACEMENT_CHARACTER`. This means that we never check the font post table. This patch changes the first check to be a strict check of the glyph list, so that we can then check the post table (and finally try the original flexible lookup).

Comment 3 Tor Andersson 2019-05-17 13:14:12 UTC

Thanks, I've implemented something very similar to your patch and it seems to do
the trick and not mess up any other test files.

commit 87023ea0c82c5c9445cde3ef5d712e97e49db7e1
Author: Tor Andersson <tor.andersson@artifex.com>
Date:   Fri May 17 12:43:41 2019 +0200

    Bug 699484: Try mapping via exact unicode, then glyph name, then fuzzy.
    
    Try different approaches in sequence if one fails to find a glyph.
    Encode by glyph name before resorting to stripping '.init' style
    glyph name suffixes.