Summary: | HelveticaBlack badly rendered (ref 7925) | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Marcos H. Woehrmann <marcos.woehrmann> |
Component: | Text | Assignee: | mpsuzuki <mpsuzuki> |
Status: | NOTIFIED FIXED | ||
Severity: | normal | CC: | alex, leonardo |
Priority: | P2 | ||
Version: | master | ||
Hardware: | All | ||
OS: | All | ||
Customer: | 700 | Word Size: | --- |
Attachments: |
screenshot.png
rendering result when post table in HVBL____.TTF is ignored patch to ignore post table including a glyphname which is defined in MacGlyphEncoding patch to ignore post table including a glyph name which is defined in ISOLatin1Encoding Approach B patch Revised patch in approach B Re-Revised patch in approach B Re-Re-Revised patch in approach B |
Description
Marcos H. Woehrmann
2007-10-03 17:19:28 UTC
Created attachment 3438 [details]
2628A.pdf
Created attachment 3439 [details]
HVBL_____.TTF
Created attachment 3440 [details]
screenshot.png
Previously assigned person is busy on other tasks. Created attachment 3467 [details] rendering result when post table in HVBL____.TTF is ignored The font HVBL____.TTF has strange post table. I'm afraid it's wrong post table, but yet I'm not sure, so I call "strange". The content of post table is like following: format = 0x00020000 glyphNameIndex[0] = 65535 glyphNameIndex[1] = 0 glyphNameIndex[2] = 32 glyphNameIndex[3] = 65535 glyphNameIndex[4] = 33 glyphNameIndex[5] = 34 glyphNameIndex[6] = 35 glyphNameIndex[7] = 36 glyphNameIndex[8] = 37 glyphNameIndex[9] = 38 glyphNameIndex[10] = 213 glyphNameIndex[11] = 40 glyphNameIndex[12] = 41 glyphNameIndex[13] = 42 glyphNameIndex[14] = 43 glyphNameIndex[15] = 44 glyphNameIndex[16] = 45 glyphNameIndex[17] = 65535 ... glyphName[0] = .notdef glyphName[1] = space glyphName[2] = nbspace glyphName[3] = exclam glyphName[4] = quotedbl glyphName[5] = numbersign glyphName[6] = dollar glyphName[7] = percent glyphName[8] = ampersand glyphName[9] = quoteright glyphName[10] = parenleft glyphName[11] = parenright glyphName[12] = asterisk glyphName[13] = plus glyphName[14] = comma ... TrueType font specification by Microsoft tells as following: The glyph name array maps the glyphs in this font to name index. If the name index is between 0 and 257, treat the name index as a glyph index in the Macintosh standard order. If the name index is between 258 and 32767, then subtract 258 and use that to index into the list of Pascal strings at the end of the table. So, in format 2.0 post table, the glyph name array should include only the names for glyph name index > 258. But, apparently, the glyph names for Macintosh standard order are included. I think, glyph name array is designed for raw glyph name index (258 is NOT substracted yet). This strange post table makes /.getpost generates wrong encoding table, and a Type42 font dict converted from HVBL____.TTF has wrong character encoding table. If I ignore post table of HVBL____.TTF by following modification for gs_ttf.ps, the wrong text is displayed correctly, as attached PNG. # rename "post" of HVBL____.TTF is easier, but the font flag # is set to protect against modification without license. --- lib/gs_ttf.ps 2007-10-01 15:29:23.000000000 +0900 +++ newlib/gs_ttf.ps 2007-10-12 15:09:41.000000000 +0900 @@ -1490,7 +1498,9 @@ TTFDEBUG { (.loadttfont) = } if //false 0 .loadttfonttables .makesfnts - .getpost + counttomark 1 sub index + TTFDEBUG { (check this is Helvetica-Black:) print dup == flush } if + /Helvetica-Black ne { .getpost } { /glyphencoding [ ] def } ifelse .pickcmap mark .charkeys Yet I've not tested Adobe products' behaviour, but I guess they don't care the post table contents for un-embedded TrueType fonts, so their behaviour is not confused by strange post table. Anyway, I should investigate Adobe products' behaviour when such fonts including strange post table are embedded into PDF. If strange post table is embedded as they are, such confusion may arise. I don't have good idea to detect whether post table is normal or "strange". As a workaround until the algorithm to detect "strange" post table, adding an option for Fontmap to ignore post table forcibly is helpful, I think. However, it's not effective against PDF including TrueType font with strange post table. >So, in format 2.0 post table, the glyph name array should
>include only the names for glyph name index > 258.
Oops, off-by-1 mistake. I mean "glyph name index > 257".
Regarding Comment #5 "I don't have good idea to detect whether post table is normal or "strange" : Here is one idea : If the Pascal strings contain names from MacRomanEncoding, and if they're mapped with no subtracting 257, than it is "strange". However the problem here is not the recognition of the "strange" feature. A strange encoding could map correctly anyway, so we need to check the mapping. But I afraid this isn't possible with no linguistic tools. Ignoring "strange" table as a heuristical workaround can help until we get conter-examples. However I think we need to study deeper what Adobe does in this case. Created attachment 3487 [details] patch to ignore post table including a glyphname which is defined in MacGlyphEncoding Here is a patch to ignore the whole of post table when it includes a glyph name which is predefined by MacGlyphEncoding. Now under regression test. --- lib/gs_ttf.ps.orig 2007-10-01 15:29:23.000000000 +0900 +++ lib/gs_ttf.ps 2007-10-22 23:14:04.000000000 +0900 @@ -192,6 +192,7 @@ % Invert the MacRomanEncoding. /.romanmacdict MacRomanEncodingForTrueType .invert_encoding def +/.glyphmacdict MacGlyphEncoding .invert_encoding def % Define remapping for misnamed glyphs in TrueType 'post' tables. % There are probably a lot more than this! @@ -621,7 +622,18 @@ postglyphs postpos //get_from_stringarray exec postglyphs postpos 1 add 2 index //getinterval_from_stringarray exec cvn exch postpos add 1 add /postpos exch def - 2 index 3 1 roll put + 2 index 3 1 roll + % Some TrueType fonts converted by "Windows Type 1 Installer" has + % problematic post table including MacGlyphEncoding entries which + % should be omitted. Such extra entries in the beginning of glyphName + % array make /Encoding broken. If we find predefined glyph name in + % the post table, empty encoding is returned. + .glyphmacdict 1 index known { + TTFDEBUG { (ignore post table that redefines MacExpert glyphname /) print dup == flush } if + pop pop pop pop [ ] /numglyphs 0 def exit + } { + put + } ifelse } for /postnames exch def numglyphs array 0 1 numglyphs 1 sub { The patch looks acceptable, but I'd like to wait for regression test results. Oops sorry there is a defect in the patch. It checks for MacRomanEncoding names and prints a message about MacExpert glyph name. Also 'glyphname' isn't a correct English word. My previous patch caused a regression for 159.pdf, the glyph "(R)" is replaced by missing glyph mark. Now I'm going to revise the patch. Created attachment 3527 [details]
patch to ignore post table including a glyph name which is defined in ISOLatin1Encoding
The regression in 159.pdf is caused by following scenario.
1) raw arial.ttf has post in format 2.0, the table includes a few
glyph names literally which is defined by MacGlyphEncoding: /mu1, /CR.
I guess the reason why predefined MacGlyphEncoding index were not used
may be this font is for Microsoft Windows. Yet I've not checked other
Microsoft Windows fonts.
2) if these glyphs are included in PDF and glyph index for embedded arial.ttf
is
not restructured, these predefined glyph names are copied literally into
post
table format 2.0
3) my previous patch detects it and recognizes the post table as broken,
although the post table should be dealt as it is.
The background of previous patch is to detect the problematic post table
including whole of MacGlyphEncoding, not to detect a few predefined glyph
names in post table. Now I use ISOLatin1Encoding which is (expected to be)
the cross section of WinAnsiEncoding and MacGlyphEncoding. By this loose
checking, regression 159.pdf is solved.
now under regression test. The last patch is good. Please commit if it passed thhe regression test. Revised patch by ISOLatin1Encoding show no regression. I committed the patch to SVN rev 8351 and this bug is fixed, I think. Marcos reported this bug is not fixed completely. When a PDF makes a font object from external TTF files converted by "Windows Type 1 Installer" and WinAnsiEncoding, ghostscript still warns some glyphs are missing in rendering, like this: Substituting .notdef for quotesingle in the font GillSans Substituting .notdef for quotedblleft in the font GillSans-Bold Substituting .notdef for quotedblright in the font GillSans-Bold Substituting .notdef for quotedblleft in the font GillSans Substituting .notdef for quotedblright in the font GillSans In fact, the rendered text lacks quotesingle glyph which is included in GillSans font. It seems that, when I ignore the problematic post table in format 2.0, current ghostscript creates Encoding array by combination of Microsoft+UCS2 cmap and ISOLatin1Encoding. In ISOLatin1Encoding, the glyph name /quotesingle is not registered, so, the quotesingle glyph requested by WinAnsiEncoding won't be resolved. After the analysis of post format 2.0 table generated by "Windows Type 1 Installer", I found there are 2 different approach to fix this issue. Which is better? Approach A ========== The procedure .charkeys uses ISOLatin1Encoding to create Encoding from MS+UCS2 cmap without post, as following. /.charkeys { TTFDEBUG { (glyphencoding: length=) print glyphencoding dup length = === flush } if % Hack: if there is no usable post table but the cmap uses % the Microsoft Unicode encoding, use ISOLatin1Encoding. % if 'post' presents, .charkeys computes (with dropping minor details) : % CharStrings = glyphencoding^-1 % Encoding = cmap*glyphencoding % because 'post' maps glyph indices to glyph names. % Otherwise .charkeys must compute (with dropping same details) : % CharStrings = glyphencoding^-1 * cmap % Encoding = glyphencoding % because glyphencoding is stubbed with an encoding, % which maps char codes to glyph names. glyphencoding length 0 eq { /have_post false def cmapsub 0 4 getinterval <00030001> eq { PDFDEBUG { (No post but have cmap 3.1, so use ISOLatin1Encoding) = } if /glyphencoding ISOLatin1Encoding dup length array copy def } { PDFDEBUG { (No encoding info, use .GS_extended_SymbolEncoding) = } if /glyphencoding /.GS_extended_SymbolEncoding findencoding dup length array copy def } ifelse If we use WinAnsiEncoding instead of ISOLatin1Encoding, the problem is fixed. I think using WinAnsiEncoding for MS+UCS2 cmap is not harmful. Approach B ========== Create working Encoding from problematic post 2.0 table. It seems that, if we ignore glyphNameIndex[] array in post table (which connects the glyph index and glyph name index) and use raw glyph index to refer the glyph name array directly, the obtained Encoding seems to work well. However, we have to restrict such forcibly creation to the case of "Windows Type 1 Installer". Created attachment 3649 [details]
Approach B patch
This patch detects the problematic post table by the cross-section
with ISOLatin1Encoding. More restricted & sharp detection is expected
(e.g. exact matching with known glyphNameArray[] ?), but for
a demonstration that forcibly created Encoding can work.
Please eliminate printing redundant messages by the patch 3649. Created attachment 3741 [details] Revised patch in approach B This patch and log message is OK? -- Fix (TT): Ignore broken post 2.0 table generated by "Windows Type 1 Installer". DETAILS: This is a fix for bug 689495, that is quite specific fix to a TrueType font generated by "Windows Type 1 Installer" "Windows Type 1 Installer" makes a TrueType font including broken post table in format 2.0. Previous fix (SVN revision 8351) just ignores such broken post table, and ISOLatin1Encoding is used for fallback. When such TrueType font is combined with WinAnsiEncoding, some glyph names (exists only in WinAnsiEncoding) cannot be resolved. The post table format 2.0 uses 2 maps to assign a glyph name to TrueType glyph index: the first map is from TrueType glyph index to glyph name index (glyphNameIndex[] array), the second map is from glyph name index to glyph name string (names[] Pascal string array). The broken post table generated by "Windows Type 1 Installer" seems to use name[] array by TrueType glyph index directly, and the glyphNameIndex[] array has unreliable values. This patch sets /.broken_post when the post table is broken (the detection of broken post table is same with SVN revision 8351), then use names[] array by TrueType glyph index when /.broken_post is set. EXPECTED DIFFERENCES: None. Index: lib/gs_ttf.ps =================================================================== --- lib/gs_ttf.ps (revision 8504) +++ lib/gs_ttf.ps (working copy) @@ -623,6 +623,10 @@ postglyphs postpos 1 add 2 index //getinterval_from_stringarray exec cvn exch postpos add 1 add /postpos exch def 2 index 3 1 roll + put + } for + /postnames exch def + % Some TrueType fonts converted by "Windows Type 1 Installer" have % problematic post table including MacGlyphEncoding entries which % should be omitted. Such extra entries in the beginning of glyphName @@ -631,14 +635,17 @@ % returned. Some TrueType fonts for Microsoft Windows redefines % MacGlyphEncoding glyph name out of predefined range). To permit % such fonts, ISOLatin1Encoding is used to find broken post. Bug 689495. - .latin1isodict 1 index known { - TTFDEBUG { (ignore post table that redefines ISOLatin1Encoding glyph name /) print dup == flush } if - pop pop pop pop [ ] /numglyphs 0 def exit - } { - put - } ifelse - } for - /postnames exch def + /.broken_post false def + .latin1isodict postnames { + dup null ne { + 2 copy known { + /.broken_post true def + } if + } if + pop + } forall + pop + numglyphs array 0 1 numglyphs 1 sub { dup 2 mul 34 add postglyphs exch 2 //getinterval_from_stringarray exec dup 0 get 8 bitshift exch 1 get add dup 258 lt { @@ -666,6 +673,18 @@ } ifelse 2 index 3 1 roll put } for + + .broken_post { + pop + 0 1 postnames length 1 sub { + postnames 1 index get null eq { + postnames 1 index /.notdef put + } if + pop + } for + + [ postnames aload length 1 roll ] + } if } ifelse } bind Please include printing a warning about a broken post table. Please update patch to current revision. Created attachment 3742 [details] Re-Revised patch in approach B Is this OK? -- Fix (TT): Ignore broken post 2.0 table generated by "Windows Type 1 Installer". DETAILS: This is a fix for bug 689495, that is quite specific fix to a TrueType font generated by "Windows Type 1 Installer" "Windows Type 1 Installer" makes a TrueType font including broken post table in format 2.0. Previous fix (SVN revision 8351) just ignores such broken post table, and ISOLatin1Encoding is used for fallback. When such TrueType font is combined with WinAnsiEncoding, some glyph names (exists only in WinAnsiEncoding) cannot be resolved. The post table format 2.0 uses 2 maps to assign a glyph name to TrueType glyph index: the first map is from TrueType glyph index to glyph name index (glyphNameIndex[] array), the second map is from glyph name index to glyph name string (names[] Pascal string array). The broken post table generated by "Windows Type 1 Installer" seems to use name[] array by TrueType glyph index directly, and the glyphNameIndex[] array has unreliable values. This patch sets /.broken_post when the post table is broken (the detection of broken post table is same with SVN revision 8351), then use names[] array by TrueType glyph index when /.broken_post is set. EXPECTED DIFFERENCES: None. Index: lib/gs_ttf.ps =================================================================== --- lib/gs_ttf.ps (revision 8507) +++ lib/gs_ttf.ps (working copy) @@ -623,6 +623,10 @@ postglyphs postpos 1 add 2 index //getinterval_from_stringarray exec cvn exch postpos add 1 add /postpos exch def 2 index 3 1 roll + put + } for + /postnames exch def + % Some TrueType fonts converted by "Windows Type 1 Installer" have % problematic post table including MacGlyphEncoding entries which % should be omitted. Such extra entries in the beginning of glyphName @@ -631,14 +635,20 @@ % returned. Some TrueType fonts for Microsoft Windows redefines % MacGlyphEncoding glyph name out of predefined range). To permit % such fonts, ISOLatin1Encoding is used to find broken post. Bug 689495. - .latin1isodict 1 index known { - TTFDEBUG { (ignore post table that redefines ISOLatin1Encoding glyph name /) print dup == flush } if - pop pop pop pop [ ] /numglyphs 0 def exit - } { - put - } ifelse - } for - /postnames exch def + /.broken_post false def + .latin1isodict postnames { + dup null ne + % dup /.notdef ne and + { + 2 copy known { + TTFDEBUG { (ignore post table that redefines ISOLatin1Encoding glyph name ) print dup == flush } if + /.broken_post true def + } if + } if + pop + } forall + pop + numglyphs array 0 1 numglyphs 1 sub { dup 2 mul 34 add postglyphs exch 2 //getinterval_from_stringarray exec dup 0 get 8 bitshift exch 1 get add dup 258 lt { @@ -666,6 +676,18 @@ } ifelse 2 index 3 1 roll put } for + + .broken_post { + pop + 0 1 postnames length 1 sub { + postnames 1 index get null eq { + postnames 1 index /.notdef put + } if + pop + } for + + [ postnames aload length 1 roll ] + } if } ifelse } bind Please print <=1 message per font. Sorry, I'm not sure what you mean. Please give me more detailed order. 1) The message of broken post table is only displayed when "TTFDEBUG" is set. You say it should be displayed always (even if TTFDEBUG is not set)? 2) You say the messages by /.type42build (gs_typ42.ps) "Substituting /.notdef for ..." per missing glyph should be eliminated when post table is broken? The code : TTFDEBUG { (ignore post table that redefines ISOLatin1Encoding glyph name /) print dup == flush } if should be executed once per font with a single glyph name. No need to print it 256 times (for each glyph). For faster processing exit the loop when finding a first redefinition. Created attachment 3743 [details] Re-Re-Revised patch in approach B I see. This patch is OK? The detection of conflict between glyph names in post 2.0 and ISOLatin1Encoding is aborted when the first conflict is found. + .latin1isodict postnames { + dup null ne + % dup /.notdef ne and + { + 2 copy known { + TTFDEBUG { (ignore post table that redefines ISOLatin1Encoding glyph name ) print dup == flush } if + /.broken_post true def + pop exit <- *** this line is added *** + } if + } if + pop + } forall + pop Index: lib/gs_ttf.ps =================================================================== --- lib/gs_ttf.ps (revision 8508) +++ lib/gs_ttf.ps (working copy) @@ -618,6 +618,10 @@ postglyphs postpos 1 add 2 index //getinterval_from_stringarray exec cvn exch postpos add 1 add /postpos exch def 2 index 3 1 roll + put + } for + /postnames exch def + % Some TrueType fonts converted by "Windows Type 1 Installer" have % problematic post table including MacGlyphEncoding entries which % should be omitted. Such extra entries in the beginning of glyphName @@ -626,14 +630,21 @@ % returned. Some TrueType fonts for Microsoft Windows redefines % MacGlyphEncoding glyph name out of predefined range). To permit % such fonts, ISOLatin1Encoding is used to find broken post. Bug 689495. - .latin1isodict 1 index known { - TTFDEBUG { (ignore post table that redefines ISOLatin1Encoding glyph name /) print dup == flush } if - pop pop pop pop [ ] /numglyphs 0 def exit - } { - put - } ifelse - } for - /postnames exch def + /.broken_post false def + .latin1isodict postnames { + dup null ne + % dup /.notdef ne and + { + 2 copy known { + TTFDEBUG { (ignore post table that redefines ISOLatin1Encoding glyph name ) print dup == flush } if + /.broken_post true def + pop exit + } if + } if + pop + } forall + pop + numglyphs array 0 1 numglyphs 1 sub { dup 2 mul 34 add postglyphs exch 2 //getinterval_from_stringarray exec dup 0 get 8 bitshift exch 1 get add dup 258 lt { @@ -661,6 +672,18 @@ } ifelse 2 index 3 1 roll put } for + + .broken_post { + pop + 0 1 postnames length 1 sub { + postnames 1 index get null eq { + postnames 1 index /.notdef put + } if + pop + } for + + [ postnames aload length 1 roll ] + } if } ifelse } bind Please commit 3743 with the log message in Comment #21. Thank you. I've committed the patch 3743 on SVN revision 8512. I'd like to revert the patches committed as revisions 8351 and 8512. In rev. 8509 and later the patches are no longer needed to render the sample file correctly. The patch uses a heuristic algorithm that produces false positives. Rev. 8512 was an attempt fix one of them. The sample file from the bug 680707 is another case. In general, a TT font can have any post table it likes. Checking the table for the presence or absence of particular glyphs cannot be used to validate the table. Alex, have you got other idea how to distinguish a bad 'post' form a good one ? If not, then the hewristic to be improved rather than reverted. Rather particular glyphs may be missing, we can calculate how many glyphs are biased, and decide by percentage or so. Also need to check whether extra glyphs map to existing glyph indices. I agree with your pointing-out against heuristic detection of
problematic post table. It's not perfect. Do you think more exact
detection is required, even if it makes PostScript routine
more ugly? Or, an operator that check the post table by C
implementation is expected? Please let me know.
I think, if rev 8351 and 8512 are reverted, the text "Karlsberger"
in 2628A.pdf will be rendered by Helvetica-Bold (any font from your
HDD), instead of Helvetica-Black (HVBL____.TTF). So, if I must say
there is a bug, it's not rev 8351/8512 but in rev 8509; there is
a bug of unexpected substitution from Helvetica-Black to Helvetica-Bold
during PDF rendering.
>The sample file from the bug 680707 is another case.
Please let me know correct number, I could not find bug 680707.
The correct bug number is 689707. I'll check why the sample file works now. To make the test more selective we can try to do it only for the fonts that have "Windows Type 1 Installer V1.0" in the name table. search operator does it nicely without decoding the table. I agree with the comment #30. The bug is in the rev. 8509, which doesn't consider files loaded from FONTPATH as resource files. To reduce the level of false rejection of post tables we can validate the post table only when the font was produced by the same software as the sample file. Please take a look at the patch for the bug 689707. Re-opening the bug 689637, which was fixed by the rev. 8509, and closing this one. |