When trying to convert the attached PDF-file, GhostScript fails with messages that it can't find a font file. Adobe Reader tells that all fonts are embedded. These are the error message of GhostScript 8.60 on Windows: C:\Programme\gs\gs8.60\bin>.\gswin32c.exe -sDEVICE=tiffg4 -o gs.tif -f A1.pdf GPL Ghostscript 8.60 (2007-08-01) Copyright (C) 2007 Artifex Software, Inc. All rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for details. Warning: the map file cidfmap was not found. Processing pages 1 through 1. Page 1 Can't find (or can't open) font file C:\Programme\gs\gs8.60\Resource/Font/KozGoP ro-Bold. Can't find (or can't open) font file KozGoPro-Bold. Querying operating system for font files... Didn't find this font on the system! Substituting font Helvetica-Bold for KozGoPro-Bold. Loading NimbusSanL-Bold font from C:\Programme\gs\fonts/n019004l.pfb... 2852768 1442918 11403784 10047291 3 done. Error: /undefined in --get-- Operand stack: --dict:6/15(L)-- F3 66 --dict:6/6(L)-- --dict:6/6(L)-- KozGoPro-Bol d --dict:11/12(ro)(G)-- --nostringval-- --dict:8/8(L)-- --dict:19/19(L)- - --dict:19/19(L)-- CIDFontName Execution stack: %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval- - 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- fa lse 1 %stopped_push 1889 1 3 %oparray_pop 1888 1 3 %oparray_ pop 1872 1 3 %oparray_pop --nostringval-- --nostringval-- 2 1 1 --nostringval-- %for_pos_int_continue --nostringval-- --nostringval-- --nostringval-- --nostringval-- %array_continue --nostringval-- false 1 %stopped_push --nostringval-- %loop_continue --nostringval-- --nos tringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval -- %array_continue --nostringval-- --nostringval-- --nostringval-- Dictionary stack: --dict:1146/1684(ro)(G)-- --dict:2/20(G)-- --dict:75/200(L)-- --dict:75 /200(L)-- --dict:106/127(ro)(G)-- --dict:274/300(ro)(G)-- --dict:21/25(L)- - --dict:4/6(L)-- --dict:20/20(L)-- --dict:1/1(ro)(G)-- --dict:1/1(ro)(G )-- --dict:11/13(L)-- Current allocation mode is local GPL Ghostscript 8.60: Unrecoverable error, exit code 1
Created attachment 3515 [details] PDF-File showing problem with embedded font
I've duplicated this problem with gs 8.60 and head (r8331).
Acrobat Reader 8.1.1 and Preview 4.0 (469) both open the file without error or warning. Furthermore Acrobat Reader reports that KozGoPro-Bold is Embedded, TrueType (CID) with Identity-H cncoding.
The PDF includes KozGoPro-Bold as CFF OpenType, and the resource type in PDF object is declared as TrueType (CIDFontType2). In most conventional PDF including Kozuka families, Kozuka fonts are embedded as raw CFF, and the PDF object is declared as CIDFontType0. This may be a root of the difference from known/working PDF with embedded CID-keyed or TrueType font. >Can't find (or can't open) font file C:\Programme\gs\gs8.60\Resource/Font/KozGoP ro-Bold. >Can't find (or can't open) font file KozGoPro-Bold. This error messages were printed by addpdfcachedfont procedure in gs_cff.ps, it scans KozGoPro-Bold as 8bit font resource (in /Font category). ReadData in gs_cff.ps seems to load font object correctly and register KozGoPro-Bold in /CIDFont category, so scanning it in /Font causes error. The procedure executed for CIDFontType0 is expected. Now I'm trying to fix.
Created attachment 3530 [details] Deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont Original /readOTTOfont in pdf_font.ps assumes non-CID-keyed font resource. This patch changes its behaviour to that in /readCIDFontType0C. The font resource loading itself works by the patch for pdf_font.ps, but PDF interpreter tries to compose a Type0 font by 2 component /FDepVector: one has /FID and another has no /FID. Such dictionary is refused as invalid font dictionary. If I remove such component before executing /.completefont (/.restructFDepVector is it), the rendering of sample PDF finished successfully (yet I've not checked the result is same with Adobe product). I guess the component without /FID is a junk which should be removed in some process. Now I'm looking for the part which should remove the junk.
One text that uses the font KozGoPro-Bold is the text "Unterdecke Bürgerzentrum" in the lower left corner of the drawing. You can verify this by removing the font resource F3 from the Font resource dictionary object 13. Then Adobe Reader will no longer list this font and will display the text differently.
Thank you, my previous patch makes wrong Encoding. It must be revised.
Created attachment 3532 [details] Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont >I guess the component without /FID is a junk which should be >removed in some process. Now I'm looking for the part which >should remove the junk. From the comparison between /readCIDFontType0C and /readOTTOfont, I found /readOTTOfont make duplicated font-resource and it should remove it by itself. By this cleaning, previous patch is simplified very much (no hook for gs_fonts.ps is require anymore). But wrong Encoding issue is not solved yet. I guessed Acrobat Reader uses cmap table in embedded CFF OpenType, but it was not. Even if I hides cmap in KozGo-Bold by renaming "cmap" to "pcma", the rendering result by Acrobat Reader is completely same.
The text object in the sample PDF is coded by UCS2 (or UTF-16), I think. For example, the "Unterdecke Bürgerzentrum" is rendered as following: /F3 100 Tf (\000U\000n\000t\000e\000r\000d\000e\000c\000k\000e) Tj 1 0 0 -1 696 6805 Tm (\000 ) Tj 1 0 0 -1 718 6805 Tm (\000B\000\374\000r\000g\000e\000r\000z\000e\000n\000t\000r\000u\000m) Tj \000\374 = U+00FC, /udieresis. Apparently, this code is incompatible with Identity-H for Adobe-Japan1. In Adobe-Japan1, /udieresis compatible glyphs: CID=219 (0x00DB) or 621 (0x026D). As a result, I have to guess Acrobat Reader combines /KozGoPro-Bold with /UniJIS-UCS2-H (or /UniJIS-UTF16-H) CMap without notice to end user. The sample PDF does not include any CMap, and does not refer any external CMap, and Acrobat Reader's document property shows as if its encoding is /Identity-H, but it's not true, I guess. So, the solution may be the heuristic attachment of Unicode CMap to /KozGoPro-Bold CIDFont object, in spite of its encoding is declared as /Identity-H. But which case? The case when /Subtype is set to /CIDFontType2 for CFF OpenType? Or, the case when /KozGoPro-Bold name is used instead of /KozGoPro-Bold-Identity-H? Further investigation is required.
Created attachment 3533 [details] Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont By this patch, /readOTTOfonts writes extra information "/Subtype" (=/CIDFontType2) in /CIDFont dict, when an embedded CFF OpenType resource is declared as /CIDFontType2. If /buildType0font finds /Subtype (=/CIDFontType2) in /FontType 9 dict && requested CMap is identical with /Identity-H, /buildType0font replaces CMap to UniXXX-UCS2-H. By this patch, encoding issue is fixed. But kerning issue is not fixed yet. Possibly /addCIDmetrics should tune metrics by UCS2 (or UTF16) codepoint instead of raw CID, in this case.
Created attachment 3536 [details] Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont Fix: Load CFF OpenType which is declared as CIDFontType2. DETAILS: Current PDF parser expects the embedded CFF OpenType font as non-CID-keyed font (see /readOTTOfont). In fact, when Adobe Distiller embeddes CFF OpenType for CJK scripts, it uses CFF font format (see /readCIDFontType0C). It may be possible to embed such in CFF OpenType font format. To handle such object, this patch extends /readOTTOfont to deal it as CID-keyed font object when its /Subtype is declared as CID-keyed fonts (/CIDFontType0, /CIDFontType0C, /CIDFontType2). Some PDF generator, e.g. PDFTron, has a bug to declare CFF OpenType object as CIDFontType2 (= TrueType). The behaviour for such buggy object is not described in PDF Reference, but Adobe Reader seems to use it as CIDFontType0 whose glyph index (CID) is not Adobe CID but UCS2. The UCS2 mapping table is not extracted from cmap table in CFF OpenType object (even if cmap table in CFF OpenType object is hidden, the behaviour is same), it may imported from Adobe Reader's own resource. The xpdf and poppler-based application cannot handle such font object. To emulate Adobe Reader's behaviour, this patch take following process. 1. To find such buggy object, this patch inserts extra /Subtype entry into CID-keyed font dictionary in /readOTTOfont. 2. When such buggy CID-keyed font is combined with some CMap and prepared for the text rendering, the difference of glyph index will occur. A hook is inserted to /buildType0. If CID-keyed font object is /CIDFontType0 (/FontType == 9) but /Subtype is known and /CIDFontType2, and requested CMap is Identity-H or Identity-V, /buildType0 replaces the CMap by appropriate UCS2 CMap (for Adobe-Japan1, CMap is replaced by UniJIS-UCS2-H, etc). In addition, /CDevProc is removed because it can cause wrong metrics. TODO: * Further investigation on Adobe Reader's implementation, especially the handling of surrogate pairs. * In the case that such buggy CID-keyed font is combined with non-Identity CMap (e.g. RKSJ-H), the feature of rearranged font feature to layer multiple CMaps is needed. * This emulation cannot handle the manually tuned metrics in /W and /W2. To support it, the convertor of Adobe CID and UCS2 character code which can be used from PostScript space is needed, because glyph index in /W and /W2 would be UCS2 character code and should be replaced by Adobe CID. Scaling of metrics values can be required. EXPECTED DIFFERENCES: None.
Now under regression test.
Created attachment 3537 [details] Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont No regression is found in regression test. In previous attachment, I reordered the patch to correspond to 2 step (throw & catch) strategy, it makes patch command confused. Here I fixed that (the content is exactly same).
Please explain why we can't use CIDToGIDMap insetad doing tricks with replacing the CMap ? I converted the PDF into EPS with Acrobat 8, and got interesting information about CMap. Did you try same ? Adobe appears not replacing the CMap.
In my patch, I made KozGoPro-Bold with Adobe-Japan1 CID interface from embedded CFF OpenType, and replace Identity CMap by Unicode CMap, to recognize the passed Unicode string correctly. In your proposal to use CIDToGIDMap, KozGoPro-Bold resource should have Unicode CID interface (obtained by resolving CIDToGIDMap) instead of Adobe-Japan1 CID and it should be combined with Identity CMap. Right? Does current ghostscript have any functionality to remap the relationship between CID and offset to CharStrings in CIDFontType 0 dictionary? I think CIDFontType 0 has no mechanism comparable to "CIDMap" in CIDFontType 2, so we have to reorganize the long binary data in GlyphData (or split GlyphData and synthesize the huge sized GlyphDictionary). Right? I'm afraid it's very complicated work. Yet I've not tested how Acrobat 8 reads the sample PDF, but I'm sure Adobe won't replace CMap as I did. I think Adobe makes CIDFontType 2 dictionary even if the embedded stream is CFF OpenType (so CIDMap works), and postpones the discrimination of sfnt OpenType and CFF OpenType to the phase picking a glyph data from loca/glyf table or CFF table. To do such in ghostscript, drastic rewriting of font resource management may be required (if we make CIDFontType 0 dictionary from CFF OpenType immediately, such postpone is impossible). How do you think of? Anyway, hardwiring of external UniXXX-H CMap is not the best implementation, generating appropriate CMap from CIDToGIDMap would be better. I will try to implement such.
1. "generating appropriate CMap from CIDToGIDMap would be better. I will try to implement such." Please don't spend time for coding before creating a right design. 2. "In your proposal to use CIDToGIDMap, KozGoPro-Bold resource should have Unicode CID interface (obtained by resolving CIDToGIDMap) instead of Adobe-Japan1 CID and it should be combined with Identity CMap. Right?" Not right. The resource is a font which defines a mapping of char codes to glyphs within its own font technology. In the test case the technology is Open Type. The quesion is how to comply that technology to Postscript world (because our PDF interpreter is coded in Postscxript). There is nothing special here about Unicode. In the Postscript world the mapping goes with CMap*CIDMap ('*' means a superposition of maps). So the natural question is whether CIDMap helps to the right glyph maping, or whether its analogue CIDToGIDMap (in the PDF world) does. 3. "Yet I've not tested how Acrobat 8 reads the sample PDF," Then please do. 4. "but I'm sure Adobe won't replace CMap as I did" : Right you are. 5. "I think Adobe makes CIDFontType 2 dictionary even if the embedded stream is CFF OpenType" : A wrong gi\uessing. 6. "I'm afraid it's very complicated work." Well entire Ghostscript is a complicated work. Please let us know if you want a simpler work. 7. "To do such in ghostscript, drastic rewriting of font resource management may be required" : Postsxcript resourses is a kind of file system< which is not related to this case. What we need here is a proper hasndling of OpenType. It was not done before now, so the work may be big and complex. We assumed that you can do it. 8. "generating appropriate CMap from CIDToGIDMap would be better. I will try to implement such." : Right bet. Looking ahead for your success.
Created attachment 3605 [details] Re-Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont Here is re-re-re-revised patch that tries to remap by the content of CIDToGIDMap stream. I think the procedures /.print_lsb16hex and /.lsb16str are existing procedures with different names. Please let me know right way how to use them in pdf_font.ps. Now under regression test. Fix: Load CFF OpenType which is declared as CIDFontType2. DETAILS: Current PDF parser expects the embedded CFF OpenType font as non-CID-keyed font (see /readOTTOfont). In fact, when Adobe Distiller embeddes CFF OpenType for CJK scripts, it uses CFF font format (see /readCIDFontType0C). It may be possible to embed such in CFF OpenType font format. To handle such object, this patch extends /readOTTOfont to deal it as CID-keyed font object when its /Subtype is declared as CID-keyed fonts (/CIDFontType0, /CIDFontType0C, /CIDFontType2). Some PDF generator, e.g. PDFTron, has a bug to declare CFF OpenType object as CIDFontType2 (= TrueType) and insert the CIDToGIDMap stream (it seems that CIDToGIDMap stream is generated by the expansion of cmap table in CFF OpenType). The behaviour for such buggy font resource is not described in PDF Reference, but Adobe Reader seems to use it as CIDFontType0 whose CID is not bare Adobe CID but remapped by CIDToGIDMap stream. To emulate Adobe Reader's behaviour, this patch take following process. 1. To find such font object, this patch inserts extra /Subtype entry into CID-keyed font dictionary in /readOTTOfont. When such font object is loaded, /readOTTOfont procedure defines 2 CIDFont-specific CMap by expansion of CIDToGIDMap stream, e.g. KozGoPro-Bold-OTTO-H and KozGoPro-Bold-OTTO-V for CFF CIDFontType2 KozGoPro-Bold including CIDToGIDMap. 2. When such CID-keyed font is combined with some CMap and prepared for the text rendering, the difference of glyph index (between base Adobe CID versus CID remapped by CIDToGIDMap) will occur. A hook is inserted to /buildType0. If CID-keyed font object is /CIDFontType0 (/FontType == 9) but /Subtype is known and /CIDFontType2, and requested CMap is Identity-H or Identity-V, /buildType0 replaces the CMap by CIDFont-specific CMap which is defined in loading of CIDFont resource. In addition, /CDevProc is removed because it can cause wrong metrics. If CIDFont-specific CMap is not defined, UCS2 CMap matching with CIDSystemInfo is used as fallback (UniJIS-UCS2-H for Adobe-Japan1, etc). TODO: * In the case that such buggy CID-keyed font is combined with non-Identity CMap (e.g. RKSJ-H), the feature of rearranged font feature to layer multiple CMaps is needed. * This emulation cannot handle the manually tuned metrics in /W and /W2. To support it, the convertor of Adobe CID and UCS2 character code which can be used from PostScript space is needed, because glyph index in /W and /W2 would be UCS2 character code and should be replaced by Adobe CID. Scaling of metrics values can be required.
Here is my review of the patch 3605. 1. A conversion of CIDToGIDMap into a font-specific CMap is tricky. Please explain why we can't use the identity CMap specified in the document together with the CIDToDIDMap defined in the document. 2. The modularity looks imperfect. Since OpenType has top level structure as TrueType, its processing should go to gs_ttf.ps . It already defines functions putu16 which the author is looking for. Please give a clear answer for (1) before staring any coding.
This problem has been fixed by the rev. 8646. See the bug 689763 for details.