Bug 689538

Summary: GhostScript can not handle an embedded TrueType CID-Font
Product: Ghostscript Reporter: artifex
Component: PDF InterpreterAssignee: leonardo <leonardo>
Status: NOTIFIED FIXED    
Severity: normal CC: leonardo, zfarberovich
Priority: P2    
Version: 8.60   
Hardware: PC   
OS: Windows XP   
Customer: 870 Word Size: ---
Attachments: Deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont
Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont
Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont
Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont
Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont
Re-Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont

Description artifex 2007-10-31 07:37:40 UTC
When trying to convert the attached PDF-file, GhostScript fails with messages
that it can't find a font file. Adobe Reader tells that all fonts are embedded.
These are the error message of GhostScript 8.60 on Windows:

C:\Programme\gs\gs8.60\bin>.\gswin32c.exe -sDEVICE=tiffg4 -o gs.tif -f A1.pdf

GPL Ghostscript 8.60 (2007-08-01)
Copyright (C) 2007 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Warning: the map file cidfmap was not found.
Processing pages 1 through 1.
Page 1
Can't find (or can't open) font file C:\Programme\gs\gs8.60\Resource/Font/KozGoP
ro-Bold.
Can't find (or can't open) font file KozGoPro-Bold.
Querying operating system for font files...
Didn't find this font on the system!
Substituting font Helvetica-Bold for KozGoPro-Bold.
Loading NimbusSanL-Bold font from C:\Programme\gs\fonts/n019004l.pfb... 2852768
1442918 11403784 10047291 3 done.
Error: /undefined in --get--
Operand stack:
   --dict:6/15(L)--   F3   66   --dict:6/6(L)--   --dict:6/6(L)--   KozGoPro-Bol
d   --dict:11/12(ro)(G)--   --nostringval--   --dict:8/8(L)--   --dict:19/19(L)-
-   --dict:19/19(L)--   CIDFontName
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval-
-   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   fa
lse   1   %stopped_push   1889   1   3   %oparray_pop   1888   1   3   %oparray_
pop   1872   1   3   %oparray_pop   --nostringval--   --nostringval--   2   1
1   --nostringval--   %for_pos_int_continue   --nostringval--   --nostringval--
  --nostringval--   --nostringval--   %array_continue   --nostringval--   false
  1   %stopped_push   --nostringval--   %loop_continue   --nostringval--   --nos
tringval--   --nostringval--   --nostringval--   --nostringval--   --nostringval
--   %array_continue   --nostringval--   --nostringval--   --nostringval--
Dictionary stack:
   --dict:1146/1684(ro)(G)--   --dict:2/20(G)--   --dict:75/200(L)--   --dict:75
/200(L)--   --dict:106/127(ro)(G)--   --dict:274/300(ro)(G)--   --dict:21/25(L)-
-   --dict:4/6(L)--   --dict:20/20(L)--   --dict:1/1(ro)(G)--   --dict:1/1(ro)(G
)--   --dict:11/13(L)--
Current allocation mode is local
GPL Ghostscript 8.60: Unrecoverable error, exit code 1
Comment 1 artifex 2007-10-31 07:39:19 UTC
Created attachment 3515 [details]
PDF-File showing problem with embedded font
Comment 2 Marcos H. Woehrmann 2007-10-31 11:08:40 UTC
I've duplicated this problem with gs 8.60 and head (r8331).
Comment 3 Marcos H. Woehrmann 2007-10-31 11:14:17 UTC
Acrobat Reader 8.1.1 and Preview 4.0 (469) both open the file without error or warning.  Furthermore 
Acrobat Reader reports that KozGoPro-Bold is Embedded, TrueType (CID) with Identity-H cncoding.
Comment 4 mpsuzuki 2007-11-03 03:36:19 UTC
The PDF includes KozGoPro-Bold as CFF OpenType, and the resource type
in PDF object is declared as TrueType (CIDFontType2). In most conventional
PDF including Kozuka families, Kozuka fonts are embedded as raw CFF, and
the PDF object is declared as CIDFontType0. This may be a root of the
difference from known/working PDF with embedded CID-keyed or TrueType
font.

>Can't find (or can't open) font file C:\Programme\gs\gs8.60\Resource/Font/KozGoP
ro-Bold.
>Can't find (or can't open) font file KozGoPro-Bold.

This error messages were printed by addpdfcachedfont procedure in gs_cff.ps,
it scans KozGoPro-Bold as 8bit font resource (in /Font category).
ReadData in gs_cff.ps seems to load font object correctly and register
KozGoPro-Bold in /CIDFont category, so scanning it in /Font causes error.
The procedure executed for CIDFontType0 is expected. Now I'm trying to
fix. 
Comment 5 mpsuzuki 2007-11-05 23:56:57 UTC
Created attachment 3530 [details]
Deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont

Original /readOTTOfont in pdf_font.ps assumes non-CID-keyed font resource.
This patch changes its behaviour to that in /readCIDFontType0C.

The font resource loading itself works by the patch for pdf_font.ps,
but PDF interpreter tries to compose a Type0 font by 2 component
/FDepVector: one has /FID and another has no /FID. Such dictionary
is refused as invalid font dictionary. If I remove such component
before executing /.completefont (/.restructFDepVector is it),
the rendering of sample PDF finished successfully (yet I've not
checked the result is same with Adobe product).

I guess the component without /FID is a junk which should be
removed in some process. Now I'm looking for the part which
should remove the junk.
Comment 6 artifex 2007-11-06 00:39:56 UTC
One text that uses the font KozGoPro-Bold is the text "Unterdecke Bürgerzentrum"
in the lower left corner of the drawing. You can verify this by removing the
font resource F3 from the Font resource dictionary object 13. Then Adobe Reader
will no longer list this font and will display the text differently.
Comment 7 mpsuzuki 2007-11-06 00:51:26 UTC
Thank you, my previous patch makes wrong Encoding.
It must be revised.
Comment 8 mpsuzuki 2007-11-06 12:23:55 UTC
Created attachment 3532 [details]
Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont  	 

>I guess the component without /FID is a junk which should be
>removed in some process. Now I'm looking for the part which
>should remove the junk.

From the comparison between /readCIDFontType0C and /readOTTOfont,
I found /readOTTOfont make duplicated font-resource and it should
remove it by itself. By this cleaning, previous patch is simplified
very much (no hook for gs_fonts.ps is require anymore).

But wrong Encoding issue is not solved yet. I guessed Acrobat Reader
uses cmap table in embedded CFF OpenType, but it was not.
Even if I hides cmap in KozGo-Bold by renaming "cmap" to "pcma",
the rendering result by Acrobat Reader is completely same.
Comment 9 mpsuzuki 2007-11-07 03:56:57 UTC
The text object in the sample PDF is coded by UCS2 (or UTF-16), I think.
For example, the "Unterdecke Bürgerzentrum" is rendered as following:

/F3 100 Tf
(\000U\000n\000t\000e\000r\000d\000e\000c\000k\000e) Tj
1 0 0 -1 696 6805 Tm
(\000 ) Tj
1 0 0 -1 718 6805 Tm
(\000B\000\374\000r\000g\000e\000r\000z\000e\000n\000t\000r\000u\000m) Tj

\000\374 = U+00FC, /udieresis.

Apparently, this code is incompatible with Identity-H for Adobe-Japan1.
In Adobe-Japan1, /udieresis compatible glyphs: CID=219 (0x00DB) or 621 (0x026D).
As a result, I have to guess Acrobat Reader combines /KozGoPro-Bold with
/UniJIS-UCS2-H (or /UniJIS-UTF16-H) CMap without notice to end user.
The sample PDF does not include any CMap, and does not refer any external
CMap, and Acrobat Reader's document property shows as if its encoding
is /Identity-H, but it's not true, I guess.

So, the solution may be the heuristic attachment of Unicode CMap to
/KozGoPro-Bold CIDFont object, in spite of its encoding is declared as
/Identity-H. But which case? The case when /Subtype is set to /CIDFontType2
for CFF OpenType? Or, the case when /KozGoPro-Bold name is used instead
of /KozGoPro-Bold-Identity-H? Further investigation is required.

Comment 10 mpsuzuki 2007-11-07 06:26:24 UTC
Created attachment 3533 [details]
Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont 

By this patch, /readOTTOfonts writes extra information
"/Subtype" (=/CIDFontType2) in /CIDFont dict, when an
embedded CFF OpenType resource is declared as /CIDFontType2.

If /buildType0font finds /Subtype (=/CIDFontType2) in
/FontType 9 dict && requested CMap is identical with
/Identity-H, /buildType0font replaces CMap to UniXXX-UCS2-H.

By this patch, encoding issue is fixed. But kerning issue
is not fixed yet. Possibly /addCIDmetrics should tune metrics
by UCS2 (or UTF16) codepoint instead of raw CID, in this case.
Comment 11 mpsuzuki 2007-11-09 18:41:43 UTC
Created attachment 3536 [details]
Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont 

Fix: Load CFF OpenType which is declared as CIDFontType2.

DETAILS:
Current PDF parser expects the embedded CFF OpenType font as
non-CID-keyed font (see /readOTTOfont). In fact, when Adobe
Distiller embeddes CFF OpenType for CJK scripts, it uses CFF
font format (see /readCIDFontType0C). It may be possible to
embed such in CFF OpenType font format. To handle such object,
this patch extends /readOTTOfont to deal it as CID-keyed font
object when its /Subtype is declared as CID-keyed fonts
(/CIDFontType0, /CIDFontType0C, /CIDFontType2).

Some PDF generator, e.g. PDFTron, has a bug to declare CFF
OpenType object as CIDFontType2 (= TrueType). The behaviour
for such buggy object is not described in PDF Reference, but
Adobe Reader seems to use it as CIDFontType0 whose glyph
index (CID) is not Adobe CID but UCS2. The UCS2 mapping table
is not extracted from cmap table in CFF OpenType object (even
if cmap table in CFF OpenType object is hidden, the behaviour
is same), it may imported from Adobe Reader's own resource.
The xpdf and poppler-based application cannot handle such
font object.

To emulate Adobe Reader's behaviour, this patch take following
process.

1. To find such buggy object, this patch inserts extra /Subtype
   entry into CID-keyed font dictionary in /readOTTOfont.

2. When such buggy CID-keyed font is combined with some CMap
   and prepared for the text rendering, the difference of glyph
   index will occur. A hook is inserted to /buildType0.
   If CID-keyed font object is /CIDFontType0 (/FontType == 9)
   but /Subtype is known and /CIDFontType2, and requested CMap
   is Identity-H or Identity-V, /buildType0 replaces the CMap by
   appropriate UCS2 CMap (for Adobe-Japan1, CMap is replaced by
   UniJIS-UCS2-H, etc). In addition, /CDevProc is removed because
   it can cause wrong metrics.

TODO:
* Further investigation on Adobe Reader's implementation,
  especially the handling of surrogate pairs.

* In the case that such buggy CID-keyed font is combined with
  non-Identity CMap (e.g. RKSJ-H), the feature of rearranged font
  feature to layer multiple CMaps is needed.

* This emulation cannot handle the manually tuned metrics in
  /W and /W2. To support it, the convertor of Adobe CID and
  UCS2 character code which can be used from PostScript space is
  needed, because glyph index in /W and /W2 would be UCS2 character
  code and should be replaced by Adobe CID. Scaling of metrics
  values can be required.

EXPECTED DIFFERENCES:
None.
Comment 12 mpsuzuki 2007-11-09 18:42:19 UTC
Now under regression test.
Comment 13 mpsuzuki 2007-11-10 04:47:39 UTC
Created attachment 3537 [details]
Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont 

No regression is found in regression test.

In previous attachment, I reordered the patch to correspond
to 2 step (throw & catch) strategy, it makes patch command
confused. Here I fixed that (the content is exactly same).
Comment 14 leonardo 2007-11-22 03:10:30 UTC
Please explain why we can't use CIDToGIDMap insetad doing tricks with replacing 
the CMap ? 

I converted the PDF into EPS with Acrobat 8, and got interesting information 
about CMap. Did you try same ? Adobe appears not replacing the CMap.
Comment 15 mpsuzuki 2007-11-22 04:08:05 UTC
In my patch, I made KozGoPro-Bold with Adobe-Japan1 CID interface from
embedded CFF OpenType, and replace Identity CMap by Unicode CMap, to
recognize the passed Unicode string correctly.
In your proposal to use CIDToGIDMap, KozGoPro-Bold resource should have
Unicode CID interface (obtained by resolving CIDToGIDMap) instead of
Adobe-Japan1 CID and it should be combined with Identity CMap. Right?

Does current ghostscript have any functionality to remap the relationship
between CID and offset to CharStrings in CIDFontType 0 dictionary?
I think CIDFontType 0 has no mechanism comparable to "CIDMap" in
CIDFontType 2, so we have to reorganize the long binary data in GlyphData
(or split GlyphData and synthesize the huge sized GlyphDictionary).
Right? I'm afraid it's very complicated work.

Yet I've not tested how Acrobat 8 reads the sample PDF, but I'm sure
Adobe won't replace CMap as I did. I think Adobe makes CIDFontType 2
dictionary even if the embedded stream is CFF OpenType (so CIDMap works),
and postpones the discrimination of sfnt OpenType and CFF OpenType
to the phase picking a glyph data from loca/glyf table or CFF table.
To do such in ghostscript, drastic rewriting of font resource management
may be required (if we make CIDFontType 0 dictionary from CFF OpenType
immediately, such postpone is impossible). How do you think of?

Anyway, hardwiring of external UniXXX-H CMap is not the best
implementation, generating appropriate CMap from CIDToGIDMap
would be better. I will try to implement such.
Comment 16 leonardo 2007-11-22 14:05:54 UTC
1.  "generating appropriate CMap from CIDToGIDMap would be better. I will try 
to implement such."

Please don't spend time for coding before creating a right design.

2. "In your proposal to use CIDToGIDMap, KozGoPro-Bold resource should have
Unicode CID interface (obtained by resolving CIDToGIDMap) instead of
Adobe-Japan1 CID and it should be combined with Identity CMap. Right?"

Not right. The resource is a font which defines a mapping of char codes to 
glyphs within its own font technology. In the test case the technology is Open 
Type. The quesion is how to comply that technology to Postscript world (because 
our PDF interpreter is coded in Postscxript). There is nothing special here 
about Unicode. In the Postscript world the mapping goes with CMap*CIDMap ('*' 
means a superposition of maps). So the natural question is whether CIDMap helps 
to the right glyph maping, or whether its analogue CIDToGIDMap (in the PDF 
world) does.

3. "Yet I've not tested how Acrobat 8 reads the sample PDF," Then please do.

4. "but I'm sure Adobe won't replace CMap as I did" : Right you are.

5. "I think Adobe makes CIDFontType 2 dictionary even if the embedded stream is 
CFF OpenType" : A wrong gi\uessing.

6. "I'm afraid it's very complicated work." Well entire Ghostscript is a 
complicated work. Please let us know if you want a simpler work.

7. "To do such in ghostscript, drastic rewriting of font resource management
may be required" : Postsxcript resourses is a kind of file system< which is not 
related to this case. What we need here is a proper hasndling of OpenType. It 
was not done before now, so the work may be big and complex. We assumed that 
you can do it.

8. "generating appropriate CMap from CIDToGIDMap would be better. I will try to 
implement such." : Right bet. Looking ahead for your success.

Comment 17 mpsuzuki 2007-12-05 00:02:19 UTC
Created attachment 3605 [details]
Re-Re-Re-Revised patch to deal embedded CFF OpenType as CIDFont resource when its /Subtype declares as CIDFont 

Here is re-re-re-revised patch that tries to remap by
the content of CIDToGIDMap stream.
I think the procedures /.print_lsb16hex and /.lsb16str
are existing procedures with different names. Please let
me know right way how to use them in pdf_font.ps.
Now under regression test.


Fix: Load CFF OpenType which is declared as CIDFontType2.

DETAILS:
Current PDF parser expects the embedded CFF OpenType font as
non-CID-keyed font (see /readOTTOfont). In fact, when Adobe
Distiller embeddes CFF OpenType for CJK scripts, it uses CFF
font format (see /readCIDFontType0C). It may be possible to
embed such in CFF OpenType font format. To handle such object,
this patch extends /readOTTOfont to deal it as CID-keyed font
object when its /Subtype is declared as CID-keyed fonts
(/CIDFontType0, /CIDFontType0C, /CIDFontType2).

Some PDF generator, e.g. PDFTron, has a bug to declare CFF
OpenType object as CIDFontType2 (= TrueType) and insert the
CIDToGIDMap stream (it seems that CIDToGIDMap stream is
generated by the expansion of cmap table in CFF OpenType).
The behaviour for such buggy font resource is not described
in PDF Reference, but Adobe Reader seems to use it as
CIDFontType0 whose CID is not bare Adobe CID but remapped
by CIDToGIDMap stream.

To emulate Adobe Reader's behaviour, this patch take following
process.

1. To find such font object, this patch inserts extra /Subtype
   entry into CID-keyed font dictionary in /readOTTOfont.
   When such font object is loaded, /readOTTOfont procedure
   defines 2 CIDFont-specific CMap by expansion of CIDToGIDMap
   stream, e.g. KozGoPro-Bold-OTTO-H and KozGoPro-Bold-OTTO-V
   for CFF CIDFontType2 KozGoPro-Bold including CIDToGIDMap.

2. When such CID-keyed font is combined with some CMap and
   prepared for the text rendering, the difference of glyph
   index (between base Adobe CID versus CID remapped by
   CIDToGIDMap) will occur. A hook is inserted to /buildType0.
   If CID-keyed font object is /CIDFontType0 (/FontType == 9)
   but /Subtype is known and /CIDFontType2, and requested CMap
   is Identity-H or Identity-V, /buildType0 replaces the CMap by
   CIDFont-specific CMap which is defined in loading of CIDFont
   resource. In addition, /CDevProc is removed because it can
   cause wrong metrics.

   If CIDFont-specific CMap is not defined, UCS2 CMap matching
   with CIDSystemInfo is used as fallback (UniJIS-UCS2-H for
   Adobe-Japan1, etc).


TODO:
* In the case that such buggy CID-keyed font is combined with
  non-Identity CMap (e.g. RKSJ-H), the feature of rearranged font
  feature to layer multiple CMaps is needed.

* This emulation cannot handle the manually tuned metrics in
  /W and /W2. To support it, the convertor of Adobe CID and
  UCS2 character code which can be used from PostScript space is
  needed, because glyph index in /W and /W2 would be UCS2 character
  code and should be replaced by Adobe CID. Scaling of metrics
  values can be required.
Comment 18 leonardo 2007-12-07 23:53:53 UTC
Here is my review of the patch 3605.

1. A conversion of CIDToGIDMap into a font-specific CMap is tricky. Please 
explain why we can't use the identity CMap specified in the document together 
with the CIDToDIDMap defined in the document.

2. The modularity looks imperfect. Since OpenType has top level structure as 
TrueType, its processing should go to gs_ttf.ps . It already defines functions 
putu16 which the author is looking for.

Please give a clear answer for (1) before staring any coding.
Comment 19 Alex Cherepanov 2008-05-26 22:34:41 UTC
This problem has been fixed by the rev. 8646.
See the bug 689763 for details.