Bug 691004

Summary:	Get "Substituting .notdef for mu" if characters drawn in a certain order
Product:	Ghostscript	Reporter:	Philip Spencer <pspencer>
Component:	PDF Interpreter	Assignee:	Alex Cherepanov <alex>
Status:	RESOLVED DUPLICATE
Severity:	normal
Priority:	P4
Version:	8.70
Hardware:	PC
OS:	Linux
Customer:		Word Size:	---
Attachments:	File where ghostscript fails to render the mu File where ghostscript succeeds in rendering the mu Proposed patch to allow ghostscript to render both test files properly

Description Philip Spencer 2009-12-15 11:47:57 UTC

The first attached pdf document (which I uncompressed for easier inspection and
manipulation) consists solely of the Latin letter A followed by the Greek letter
mu, each in an old version of the TimesNewRoman-Italic font (but with two
different encoding vectors for the two characters).

When Ghostscript is run on it, it only displays the A and complains
"Substituting .notdef for mu". 

Although I suspect this is being triggered by something weird in the
TimesNewRoman-Italic font -- I notice, if I dump the tables, that there is no
psName name="mu" entry although there are entries for all the other lowercase
Greek letters, and the problem disappears if another Greek letter besides mu is
used -- there does seem also to be something wrong with Ghostscript itself:

The mu glyph IS defined in the font, both Adobe Reader and xpdf render the
document correctly, and, most tellingly, GhostScript itself renders it correctly
if the A is removed, or if the order of the A and the mu is simply reversed.

Specifically, Ghostscript correctly renders the second attached pdf document,
which a simple diff demonstrates is identical to the first except that the order
of the A and the mu has been reversed:
diff -au bad.pdf works.pdf
--- bad.pdf	2009-12-15 13:55:01.000000000 -0500
+++ works.pdf	2009-12-15 14:36:43.000000000 -0500
@@ -3840,18 +3840,18 @@
 1 0 0 1 95.68265 732.14548 cm
 q
 BT
-/Fo0S0 12.00000 Tf
+/Fo0S2 12.00000 Tf
 0.08627 0.07843 0.07451 RG
 0.08627 0.07843 0.07451 rg
 0 Tr
 1.00000 0 0 1.00000 0.00000 -9.12695 Tm
-<44> Tj
-/Fo0S2 12.00000 Tf
+<7D> Tj
+/Fo0S0 12.00000 Tf
 0.08627 0.07843 0.07451 RG
 0.08627 0.07843 0.07451 rg
 0 Tr
 1.00000 0 0 1.00000 7.33008 -9.12695 Tm
-<7D> Tj
+<44> Tj
 ET
 Q
 Q

Comment 1 Philip Spencer 2009-12-15 11:49:23 UTC

Created attachment 5778 [details]
File where ghostscript fails to render the mu

Comment 2 Philip Spencer 2009-12-15 11:49:56 UTC

Created attachment 5779 [details]
File where ghostscript succeeds in rendering the mu

Comment 3 Philip Spencer 2009-12-17 15:11:52 UTC

Digging further into the code, I have found both the problem and a possible patch.

The first time a Font object is encountered for a particular TrueType font, that
Font object's encoding vector is used to construct a "prebuilt encoding" vector
mapping glyph names to cmap table indices, and this in turn is used to build the
CharStrings array. After this, any glyphs mentioned in the post table but not
yet included in CharStrings are added directly.

Subsequent occurrences of Font objects for that particular TrueType font use the
later occurrence's encoding vector to get a glyph name, but then use the
originally constructed CharStrings array to get the actual glyph.

In this case, the font is broken in that the "mu" glyph is not mentioned in the
post table. (Maybe as a way of placating broken apps that were getting it when
they really wanted mu1? Who knows). This means that, if it is not included in
the encoding vector of the FIRST font object, it does not get added to the
CharStrings array and hence isn't rendered even though it does appear in the
encoding vector of the SECOND font object. However, since other pdf readers
handle this case fine, it is a problem that ghostscript doesn't.

The attached patch fixes this problem at least for cmap-3,1 fonts. In this case,
AdobeGlyphList is already being used to build the glyph-name-to-cmap-table-index
mapping except for glyphs not known to AdobeGlyphList in which case the original
encoding vector's value is treated as the cmap table index. The patch simply
augments this behaviour by also adding any additional glyph names mentioned in
AdobeGlyphList and present in the cmap table but not mentioned in the original
encoding vector or the post table; that way, if that glyph name is used in a
later encoding vector, it can be properly found.

A similar patch would presumably be needed for the code branch dealing with cmap
types other than 3,1, using the list of glyph names appropriate to that
particular cmap type.

Comment 4 Philip Spencer 2009-12-17 15:15:14 UTC

Created attachment 5792 [details]
Proposed patch to allow ghostscript to render both test files properly

Comment 5 Alex Cherepanov 2010-05-02 22:45:45 UTC

Thank you for using and contributing to Ghostscript.
The problem has been fixed in a different way.
Old versions of Ghostscript cached the font instance on the
font descriptor but it may be shared between fonts with different
encodings. Since rev. 11148, the font is cached on the PDF font
resource.

See bug 690714 for the patch.

*** This bug has been marked as a duplicate of bug 690714 ***