Summary: | GhostScript can not handle an embedded TrueType CID-Font | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | artifex |
Component: | PDF Interpreter | Assignee: | leonardo <leonardo> |
Status: | NOTIFIED FIXED | ||
Severity: | normal | CC: | leonardo, zfarberovich |
Priority: | P2 | ||
Version: | 8.60 | ||
Hardware: | PC | ||
OS: | Windows XP | ||
Customer: | 870 | Word Size: | --- |
Attachments: |
Deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont
Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont Re-Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont |
Description
artifex
2007-10-31 07:37:40 UTC
Created attachment 3515 [details]
PDF-File showing problem with embedded font
I've duplicated this problem with gs 8.60 and head (r8331). Acrobat Reader 8.1.1 and Preview 4.0 (469) both open the file without error or warning. Furthermore Acrobat Reader reports that KozGoPro-Bold is Embedded, TrueType (CID) with Identity-H cncoding. The PDF includes KozGoPro-Bold as CFF OpenType, and the resource type in PDF object is declared as TrueType (CIDFontType2). In most conventional PDF including Kozuka families, Kozuka fonts are embedded as raw CFF, and the PDF object is declared as CIDFontType0. This may be a root of the difference from known/working PDF with embedded CID-keyed or TrueType font. >Can't find (or can't open) font file C:\Programme\gs\gs8.60\Resource/Font/KozGoP ro-Bold. >Can't find (or can't open) font file KozGoPro-Bold. This error messages were printed by addpdfcachedfont procedure in gs_cff.ps, it scans KozGoPro-Bold as 8bit font resource (in /Font category). ReadData in gs_cff.ps seems to load font object correctly and register KozGoPro-Bold in /CIDFont category, so scanning it in /Font causes error. The procedure executed for CIDFontType0 is expected. Now I'm trying to fix. Created attachment 3530 [details]
Deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont
Original /readOTTOfont in pdf_font.ps assumes non-CID-keyed font resource.
This patch changes its behaviour to that in /readCIDFontType0C.
The font resource loading itself works by the patch for pdf_font.ps,
but PDF interpreter tries to compose a Type0 font by 2 component
/FDepVector: one has /FID and another has no /FID. Such dictionary
is refused as invalid font dictionary. If I remove such component
before executing /.completefont (/.restructFDepVector is it),
the rendering of sample PDF finished successfully (yet I've not
checked the result is same with Adobe product).
I guess the component without /FID is a junk which should be
removed in some process. Now I'm looking for the part which
should remove the junk.
One text that uses the font KozGoPro-Bold is the text "Unterdecke Bürgerzentrum" in the lower left corner of the drawing. You can verify this by removing the font resource F3 from the Font resource dictionary object 13. Then Adobe Reader will no longer list this font and will display the text differently. Thank you, my previous patch makes wrong Encoding. It must be revised. Created attachment 3532 [details] Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont >I guess the component without /FID is a junk which should be >removed in some process. Now I'm looking for the part which >should remove the junk. From the comparison between /readCIDFontType0C and /readOTTOfont, I found /readOTTOfont make duplicated font-resource and it should remove it by itself. By this cleaning, previous patch is simplified very much (no hook for gs_fonts.ps is require anymore). But wrong Encoding issue is not solved yet. I guessed Acrobat Reader uses cmap table in embedded CFF OpenType, but it was not. Even if I hides cmap in KozGo-Bold by renaming "cmap" to "pcma", the rendering result by Acrobat Reader is completely same. The text object in the sample PDF is coded by UCS2 (or UTF-16), I think. For example, the "Unterdecke Bürgerzentrum" is rendered as following: /F3 100 Tf (\000U\000n\000t\000e\000r\000d\000e\000c\000k\000e) Tj 1 0 0 -1 696 6805 Tm (\000 ) Tj 1 0 0 -1 718 6805 Tm (\000B\000\374\000r\000g\000e\000r\000z\000e\000n\000t\000r\000u\000m) Tj \000\374 = U+00FC, /udieresis. Apparently, this code is incompatible with Identity-H for Adobe-Japan1. In Adobe-Japan1, /udieresis compatible glyphs: CID=219 (0x00DB) or 621 (0x026D). As a result, I have to guess Acrobat Reader combines /KozGoPro-Bold with /UniJIS-UCS2-H (or /UniJIS-UTF16-H) CMap without notice to end user. The sample PDF does not include any CMap, and does not refer any external CMap, and Acrobat Reader's document property shows as if its encoding is /Identity-H, but it's not true, I guess. So, the solution may be the heuristic attachment of Unicode CMap to /KozGoPro-Bold CIDFont object, in spite of its encoding is declared as /Identity-H. But which case? The case when /Subtype is set to /CIDFontType2 for CFF OpenType? Or, the case when /KozGoPro-Bold name is used instead of /KozGoPro-Bold-Identity-H? Further investigation is required. Created attachment 3533 [details]
Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont
By this patch, /readOTTOfonts writes extra information
"/Subtype" (=/CIDFontType2) in /CIDFont dict, when an
embedded CFF OpenType resource is declared as /CIDFontType2.
If /buildType0font finds /Subtype (=/CIDFontType2) in
/FontType 9 dict && requested CMap is identical with
/Identity-H, /buildType0font replaces CMap to UniXXX-UCS2-H.
By this patch, encoding issue is fixed. But kerning issue
is not fixed yet. Possibly /addCIDmetrics should tune metrics
by UCS2 (or UTF16) codepoint instead of raw CID, in this case.
Created attachment 3536 [details]
Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont
Fix: Load CFF OpenType which is declared as CIDFontType2.
DETAILS:
Current PDF parser expects the embedded CFF OpenType font as
non-CID-keyed font (see /readOTTOfont). In fact, when Adobe
Distiller embeddes CFF OpenType for CJK scripts, it uses CFF
font format (see /readCIDFontType0C). It may be possible to
embed such in CFF OpenType font format. To handle such object,
this patch extends /readOTTOfont to deal it as CID-keyed font
object when its /Subtype is declared as CID-keyed fonts
(/CIDFontType0, /CIDFontType0C, /CIDFontType2).
Some PDF generator, e.g. PDFTron, has a bug to declare CFF
OpenType object as CIDFontType2 (= TrueType). The behaviour
for such buggy object is not described in PDF Reference, but
Adobe Reader seems to use it as CIDFontType0 whose glyph
index (CID) is not Adobe CID but UCS2. The UCS2 mapping table
is not extracted from cmap table in CFF OpenType object (even
if cmap table in CFF OpenType object is hidden, the behaviour
is same), it may imported from Adobe Reader's own resource.
The xpdf and poppler-based application cannot handle such
font object.
To emulate Adobe Reader's behaviour, this patch take following
process.
1. To find such buggy object, this patch inserts extra /Subtype
entry into CID-keyed font dictionary in /readOTTOfont.
2. When such buggy CID-keyed font is combined with some CMap
and prepared for the text rendering, the difference of glyph
index will occur. A hook is inserted to /buildType0.
If CID-keyed font object is /CIDFontType0 (/FontType == 9)
but /Subtype is known and /CIDFontType2, and requested CMap
is Identity-H or Identity-V, /buildType0 replaces the CMap by
appropriate UCS2 CMap (for Adobe-Japan1, CMap is replaced by
UniJIS-UCS2-H, etc). In addition, /CDevProc is removed because
it can cause wrong metrics.
TODO:
* Further investigation on Adobe Reader's implementation,
especially the handling of surrogate pairs.
* In the case that such buggy CID-keyed font is combined with
non-Identity CMap (e.g. RKSJ-H), the feature of rearranged font
feature to layer multiple CMaps is needed.
* This emulation cannot handle the manually tuned metrics in
/W and /W2. To support it, the convertor of Adobe CID and
UCS2 character code which can be used from PostScript space is
needed, because glyph index in /W and /W2 would be UCS2 character
code and should be replaced by Adobe CID. Scaling of metrics
values can be required.
EXPECTED DIFFERENCES:
None.
Now under regression test. Created attachment 3537 [details]
Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont
No regression is found in regression test.
In previous attachment, I reordered the patch to correspond
to 2 step (throw & catch) strategy, it makes patch command
confused. Here I fixed that (the content is exactly same).
Please explain why we can't use CIDToGIDMap insetad doing tricks with replacing the CMap ? I converted the PDF into EPS with Acrobat 8, and got interesting information about CMap. Did you try same ? Adobe appears not replacing the CMap. In my patch, I made KozGoPro-Bold with Adobe-Japan1 CID interface from embedded CFF OpenType, and replace Identity CMap by Unicode CMap, to recognize the passed Unicode string correctly. In your proposal to use CIDToGIDMap, KozGoPro-Bold resource should have Unicode CID interface (obtained by resolving CIDToGIDMap) instead of Adobe-Japan1 CID and it should be combined with Identity CMap. Right? Does current ghostscript have any functionality to remap the relationship between CID and offset to CharStrings in CIDFontType 0 dictionary? I think CIDFontType 0 has no mechanism comparable to "CIDMap" in CIDFontType 2, so we have to reorganize the long binary data in GlyphData (or split GlyphData and synthesize the huge sized GlyphDictionary). Right? I'm afraid it's very complicated work. Yet I've not tested how Acrobat 8 reads the sample PDF, but I'm sure Adobe won't replace CMap as I did. I think Adobe makes CIDFontType 2 dictionary even if the embedded stream is CFF OpenType (so CIDMap works), and postpones the discrimination of sfnt OpenType and CFF OpenType to the phase picking a glyph data from loca/glyf table or CFF table. To do such in ghostscript, drastic rewriting of font resource management may be required (if we make CIDFontType 0 dictionary from CFF OpenType immediately, such postpone is impossible). How do you think of? Anyway, hardwiring of external UniXXX-H CMap is not the best implementation, generating appropriate CMap from CIDToGIDMap would be better. I will try to implement such. 1. "generating appropriate CMap from CIDToGIDMap would be better. I will try to implement such." Please don't spend time for coding before creating a right design. 2. "In your proposal to use CIDToGIDMap, KozGoPro-Bold resource should have Unicode CID interface (obtained by resolving CIDToGIDMap) instead of Adobe-Japan1 CID and it should be combined with Identity CMap. Right?" Not right. The resource is a font which defines a mapping of char codes to glyphs within its own font technology. In the test case the technology is Open Type. The quesion is how to comply that technology to Postscript world (because our PDF interpreter is coded in Postscxript). There is nothing special here about Unicode. In the Postscript world the mapping goes with CMap*CIDMap ('*' means a superposition of maps). So the natural question is whether CIDMap helps to the right glyph maping, or whether its analogue CIDToGIDMap (in the PDF world) does. 3. "Yet I've not tested how Acrobat 8 reads the sample PDF," Then please do. 4. "but I'm sure Adobe won't replace CMap as I did" : Right you are. 5. "I think Adobe makes CIDFontType 2 dictionary even if the embedded stream is CFF OpenType" : A wrong gi\uessing. 6. "I'm afraid it's very complicated work." Well entire Ghostscript is a complicated work. Please let us know if you want a simpler work. 7. "To do such in ghostscript, drastic rewriting of font resource management may be required" : Postsxcript resourses is a kind of file system< which is not related to this case. What we need here is a proper hasndling of OpenType. It was not done before now, so the work may be big and complex. We assumed that you can do it. 8. "generating appropriate CMap from CIDToGIDMap would be better. I will try to implement such." : Right bet. Looking ahead for your success. Created attachment 3605 [details]
Re-Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont
Here is re-re-re-revised patch that tries to remap by
the content of CIDToGIDMap stream.
I think the procedures /.print_lsb16hex and /.lsb16str
are existing procedures with different names. Please let
me know right way how to use them in pdf_font.ps.
Now under regression test.
Fix: Load CFF OpenType which is declared as CIDFontType2.
DETAILS:
Current PDF parser expects the embedded CFF OpenType font as
non-CID-keyed font (see /readOTTOfont). In fact, when Adobe
Distiller embeddes CFF OpenType for CJK scripts, it uses CFF
font format (see /readCIDFontType0C). It may be possible to
embed such in CFF OpenType font format. To handle such object,
this patch extends /readOTTOfont to deal it as CID-keyed font
object when its /Subtype is declared as CID-keyed fonts
(/CIDFontType0, /CIDFontType0C, /CIDFontType2).
Some PDF generator, e.g. PDFTron, has a bug to declare CFF
OpenType object as CIDFontType2 (= TrueType) and insert the
CIDToGIDMap stream (it seems that CIDToGIDMap stream is
generated by the expansion of cmap table in CFF OpenType).
The behaviour for such buggy font resource is not described
in PDF Reference, but Adobe Reader seems to use it as
CIDFontType0 whose CID is not bare Adobe CID but remapped
by CIDToGIDMap stream.
To emulate Adobe Reader's behaviour, this patch take following
process.
1. To find such font object, this patch inserts extra /Subtype
entry into CID-keyed font dictionary in /readOTTOfont.
When such font object is loaded, /readOTTOfont procedure
defines 2 CIDFont-specific CMap by expansion of CIDToGIDMap
stream, e.g. KozGoPro-Bold-OTTO-H and KozGoPro-Bold-OTTO-V
for CFF CIDFontType2 KozGoPro-Bold including CIDToGIDMap.
2. When such CID-keyed font is combined with some CMap and
prepared for the text rendering, the difference of glyph
index (between base Adobe CID versus CID remapped by
CIDToGIDMap) will occur. A hook is inserted to /buildType0.
If CID-keyed font object is /CIDFontType0 (/FontType == 9)
but /Subtype is known and /CIDFontType2, and requested CMap
is Identity-H or Identity-V, /buildType0 replaces the CMap by
CIDFont-specific CMap which is defined in loading of CIDFont
resource. In addition, /CDevProc is removed because it can
cause wrong metrics.
If CIDFont-specific CMap is not defined, UCS2 CMap matching
with CIDSystemInfo is used as fallback (UniJIS-UCS2-H for
Adobe-Japan1, etc).
TODO:
* In the case that such buggy CID-keyed font is combined with
non-Identity CMap (e.g. RKSJ-H), the feature of rearranged font
feature to layer multiple CMaps is needed.
* This emulation cannot handle the manually tuned metrics in
/W and /W2. To support it, the convertor of Adobe CID and
UCS2 character code which can be used from PostScript space is
needed, because glyph index in /W and /W2 would be UCS2 character
code and should be replaced by Adobe CID. Scaling of metrics
values can be required.
Here is my review of the patch 3605. 1. A conversion of CIDToGIDMap into a font-specific CMap is tricky. Please explain why we can't use the identity CMap specified in the document together with the CIDToDIDMap defined in the document. 2. The modularity looks imperfect. Since OpenType has top level structure as TrueType, its processing should go to gs_ttf.ps . It already defines functions putu16 which the author is looking for. Please give a clear answer for (1) before staring any coding. This problem has been fixed by the rev. 8646. See the bug 689763 for details. |