Bug 688515 - Need method to map CIDFonts to TrueType fonts when Ordering is Identity
Summary: Need method to map CIDFonts to TrueType fonts when Ordering is Identity
Status: NOTIFIED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Interpreter (show other bugs)
Version: 8.53
Hardware: All All
: P1 critical
Assignee: Ken Sharp
URL:
Keywords:
: 688813 689499 689623 690483 (view as bug list)
Depends on:
Blocks:
 
Reported: 2006-01-27 09:54 UTC by Ray Johnston
Modified: 2016-06-23 10:48 UTC (History)
6 users (show)

See Also:
Customer: 870, 770, 580, 384
Word Size: ---


Attachments
HiraMaruPro.pdf (43.16 KB, application/pdf)
2008-10-20 11:51 UTC, Marcos H. Woehrmann
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ray Johnston 2006-01-27 09:54:40 UTC
The CIDSystemInfo is an indirect object. This wasn't handled.
Patch will be committed shortly.

There is another problem after this is fixed -- undefined in findresource
searching for Adobe-Identity
Comment 1 Ray Johnston 2006-01-27 09:56:33 UTC
Created attachment 1956 [details]
solidworks.pdf
Comment 2 Ray Johnston 2006-02-01 09:43:15 UTC
The font uses CenturyGothic but doesn't embed it. 
Comment 3 Ralph Giles 2006-04-24 13:39:44 UTC
Created attachment 2172 [details]
solidworks_emb.ps

Attaching source file with the font embedded.
Comment 4 Ralph Giles 2006-04-24 13:44:21 UTC
The customer reports solidworks_emb.ps works when sent to a printer, but we
still show .notdef boxes for the embedded font.

HEAD displays for me (with .notdef box glyphs) on the x11 device. However,
converting to a file fails:

bin/gs -Ilib -I../fonts -dSAFER -sDEVICE=png16m -o solidworks_emb.png
solidworks_emb.ps
[...]
Error: /ioerror in --image--

bin/gs -Ilib -I../fonts -dSAFER -sDEVICE=pdfwrite -o solidworks_emb.pdf
solidworks_emb.ps
[...]
AFPL Ghostscript CVS PRE-RELEASE 8.54: Missing glyph CID=22 in the font
BAAAAA+CenturyGothic . The output PDF may fail with some viewers.
Error: /stackunderflow in --exch--
Comment 5 Alex Cherepanov 2006-07-08 06:36:45 UTC
The font file is likely to be invalid. 

Distiller 5 can convert solidworks_emb.ps but the PDF file shows the boxes
on both Ghostscript and Acrobat Reader 5.

Ghostscript cannot convert the file to PDF failing in zfcid0.c:189 with 
/rangecheck . The error is trapped by the sample file, but Ghostscript doesn't
restore the operand stack, which causes "/stackunderflow in --exch--".

I can not reproduce "/ioerror in --image--" in vv. 8.53, 8.54, or current CVS.

Comment 6 Alex Cherepanov 2007-04-11 06:23:42 UTC
The file is invalid -  it shows boxes on 3 different
level 3 PostScript interpreters. The file loads the font incrementally
but it adds glyphs to a different font than it uses to draw
the glyphs.

When the printer doesn't have a native composefont operastor, the file takes
a different execution path and shows the glyphs correctly.

The file can be fixed by defining pdf_has_composefont? as
  /pdf_has_composefont? false def
instead of the current definition
  /pdf_has_composefont? systemdict /composefont known def

It is also possible to develop an IdiomSet resource to patch the faulty
font handling. Since the file doesn't work on recent Adobe interpreters,
this kind of files is not expected to be common in the wild. Feel free to
reopen the bug if you want IdiomSet development.
Comment 7 Ray Johnston 2007-04-11 09:08:57 UTC
Comment on attachment 2172 [details]
solidworks_emb.ps

The customer's problem is with the PDF file, not the PS file that Ralph
created.
Comment 8 Ray Johnston 2007-04-11 09:27:35 UTC
The customer needs to be able to process the PDF (solidworks.pdf).

On a Windows system, Adobe finds and uses the CenturyGothic font in
/Windows/Fonts/GOTHIC.TTF

Since the font is referenced as a CIDFont, what is needed is a way to
use this font by mapping /CenturyGothic to this TTF, but since the
Ordering in the FontDescriptor is Identity, the following gives an error:
/CenturyGothic << /FileType /TrueType /Path (C:/windows/fonts/gothic.ttf) /CSI
[(Identity) 1] >> ;
The error message is:
Can't build /Identity.Unicode /CIDDecoding resource. See gs_ciddc.ps .

The font is referenced in the PDF by:
%Resolving compressed object: [15 0]
<< /FontName /CenturyGothic /StemV 68 /Type /FontDescriptor /FontWeight 400
/FontStretch /Normal /XHeight 531 /FontFamily (Century Gothic) /Ascent 1060
/CapHeight 718 /ItalicAngle 0 /FontBBox [-169 -307 1152 1060] /Descent -307
/Flags 4 >>
%Resolving compressed object: [16 0]
<< /Supplement 0 /Registry (Adobe) /Ordering (Identity) >>
%Resolving compressed object: [17 0]
<< /W [3 [277] 5 [309 720] 11 [369 369] 14 [606 277 332 277 437 554 554 554 554
554 554 554 554 554 554 277] 36 [740 574 813 744 536 485 872 683 226] 46 [591
462 919 740 869 592 871 607 498 426 655 702 960 609 592]] /CIDSystemInfo {16 0
resolveR} /Type /Font /DW 1000 /BaseFont /CenturyGothic /Subtype /CIDFontType2
/FontDescriptor {15 0 resolveR} >>
%Resolving compressed object: [18 0]
[{17 0 resolveR}]
%Resolving compressed object: [19 0]
<< /Type /Font /BaseFont /CenturyGothic /Encoding /Identity-H /Subtype /Type0
/DescendantFonts {18 0 resolveR} >>
%Resolving compressed object: [20 0]
[{17 0 resolveR}]
%Resolving compressed object: [21 0]
<< /Type /Font /BaseFont /CenturyGothic /Encoding /Identity-H /Subtype /Type0
/DescendantFonts {20 0 resolveR} >>

What is needed is to be able to properly display this PDF with some kind of
mapping to the font on the system (as Adobe does).
Comment 9 Ray Johnston 2007-04-23 09:30:45 UTC
Add customer 770 and change description to better match the problem
that both customers are having (refer to comment #8).

Fixing this will require coordination with Igor about the CIDfont mapping.
Comment 10 Ray Johnston 2007-04-23 09:33:16 UTC
Created attachment 2914 [details]
PDF_with_Missing_CIDFont.pdf

sample that references CenturyGothic.

Gothic.ttf and GothicB.ttf are available upon request, but should exist
on Windows systems.
Comment 11 Alex Cherepanov 2008-05-07 19:48:21 UTC
*** Bug 689499 has been marked as a duplicate of this bug. ***
Comment 12 Alex Cherepanov 2008-05-07 19:51:18 UTC
*** Bug 688813 has been marked as a duplicate of this bug. ***
Comment 13 Marcos H. Woehrmann 2008-09-26 11:32:26 UTC
Created attachment 4434 [details]
fax.pdf

Another example of a file that we can't read.  Acrobat opens the file but
complains of a missing font; evince opens and displays the file correctly.
Comment 14 Ray Johnston 2008-10-02 10:14:38 UTC
Making this a P1 bug since it keeps coming up and affects so many customers.
Comment 15 Marcos H. Woehrmann 2008-10-20 11:51:30 UTC
Created attachment 4521 [details]
HiraMaruPro.pdf

I believe this is another file that has the same problem.  This file does not
need to be marked private.
Comment 16 leonardo 2008-10-24 00:01:47 UTC
The description reads : "undefined in findresource searching for Adobe-
Identity". This error happens because the user didn't provide a default CID 
font. Actually this message is not relevant. For better result the user should 
provide cidfmap like this :

/Arial << /FileType /TrueType /Path (C:/Windows/Fonts/times.ttf) /SubfontID 
0 /CSI [(Unicode) 1] >> ;
/CenturyGothic << /FileType /TrueType /Path 
(C:/Windows/Fonts/GOTHIC.TTF) /SubfontID 0 /CSI [(Unicode) 1] >> ;

Using this map the documents are interpreted with no error, but renders wrong 
glyphs. In same time, Adobe renders right glyphs. I can't resolve this puzzle 
now. Running with -dPDFDEBUG -dTTFDEBUG I see it uses Unicode cmap 3.1, but 
the text appears re-encoded, so maybe need to chose another cmap 1.0 that 
presents in the font.


Comment 17 leonardo 2008-10-24 00:11:33 UTC
To force the choice of cmap 3.1 I insetred

       (1.0) /Identity  % hack

into Resource/Init/xlatmap as the 1st line of the table for TrueType, and 
changed Ordering to Custom in cidfmap. Then it fails with "Can't 
uild /Custom.Custom /CIDDecoding resource". This message is correct since we 
don't provide such decoding. IMO we need to generate it automatically : any 
decoding x.x to be created as an identity map.
Comment 18 leonardo 2008-10-24 00:23:40 UTC
To pass it through I inserted 

        /Custom.Custom   [ /Identity-UTF16-H ] % hack

into the .CMapChooser table in gs_ciddc.ps . Ihe document interpretes with no 
error but still renders wrong glyphs.

The dopcument fax.pdf contains an interesting ToUnicode CMap. Likely it can be 
used to choose right glyphs, but I never see such use of ToUnicode in Adobe 
specs. I recall we have had a similar bug, which I assigned to Ken a year ago. 
Need to discuss this problem with him. 
Comment 19 leonardo 2008-10-24 00:28:07 UTC
Thus all tricks with Custom is for experimenting only. They should not go to 
production now. I restored all local changes sinse the current code gives same 
result with cidfmap from Comment #16.
Comment 20 Ken Sharp 2008-10-24 00:29:28 UTC
Do you recall the bug number (or subject) for the previous issue with ToUnicode?
I can't find it on a cursory search and don't recall the problem...
Comment 21 leonardo 2008-10-24 00:54:21 UTC
I can't unpack other files (besides fax.pdf) with pdfinflat, I'll open another 
bug about it. Due to that I can't know what ToUnicode do they contain. Running 
pdfwrite I see it runs .processToUnicode but unfortunately it doesn't do any 
debug printing.
Comment 22 leonardo 2008-10-24 00:57:21 UTC
The bug mentioned in Comment #18 is bug 689623 currently assigned to Alex. I 
reread it and it gives no useful information. But definitely both bugs (this 
on e and that one) point to same problem. I would like Ken to aprticipate in 
this more actively because it's a P1 problem.
Comment 23 leonardo 2008-10-24 01:23:52 UTC
Thus now I can see only 1 ways to render these documents : 
apply ToUnicode for converting CIDs to glyph indices. Note CMap must be used 
at least to know the size of char codes (2 bytes). I have no idea how it can 
work else becuse ToUnicode looks as a *single* place where the neccessery info 
may come from. Ken ?
Comment 24 Ken Sharp 2008-10-24 01:30:57 UTC
Leo, I think you are broadly correct. I looked at #689623 and recall the issues
now. 

When the ToUnicode table is present we can use that, but if its not present, we
may need to use the 3,1 CMAP from the TrueType font (when present, the ToUnicode
CMap seems to duplicate this).

I don't see anything obvious in the specification, but its reasonably clear that
Acrobat is doing this. I'll try to modify the 3,1 CMAP subtable to prove it
categorically later today (working on something else right now).

I'll report back again later.
Comment 25 Ken Sharp 2008-10-24 06:43:53 UTC
For this issue I've looked at three different files with different cases:

1) Fax.pdf; this file does not contain the required fonts Arial and Arial,Bold
   These are used as Type 2 CIDFonts in the document with Identity-H CMaps and
   apparently identical ToUnicode CMaps.

2) solidworks.pdf; I reduced theis file by removing most of the text and all of
   the images. The file references one font, CenturyGothic, as a Type 2 CIDFont
   using an Identity-H CMap. However, it does not contain a ToUnicode CMap.

3) HiraMaruPro.pdf; this references several fonts, but actually uses an embedded
   subset 'FRVAOB+HiraMaruPro-W4' a Type 0 (Adobe format) CIDFont using an
   Identity-H CMap. Again this document does not have a ToUnicode CMap.


I'm unsure why HiraMaruPro.pdf is a problem, since the font is embedded and is
not a TrueType font. However, this seems unrelated to the TrueType problems,
so I'm going to pass on this one for now. Perhaps Alex or Leonardo know what's
up with this one. I get a 'rangecheck in --string--' error with this, which is
a different error to the other files.


I tested Ray's earlier statement (comment #8)  that Acrobat is using the
century.ttf font as a substitute for the font in solidworks.pdf by deleting the
font from my system and opening the file. Acrobat gives an error and warns that
some glyphs may not display correctly, then shows all the glyphs as bullets.
Putting the font back allows Acrobat to display the file correctly.

So it seems certain Acrobat is using this font. Now, the CMAP for this font is
fairly normal. The first glyph (0) is the .notdef, the second and third are
unused and the space glyph is at glyph ID 3, exclamation at GID 4, double quote
at GID 5 and so on.

The text in the solidworks.pdf file is supposed to read 'DC RATING' which has 
the ASCII (hex) values:


44 43 20 52 41 54 49 4E 47

The PDF file actually contains a hex string:

<002700260003003500240037002C0031002A>

That is:

0027 0026 0003 0035 0024 0037 002C 0031 002A

So obviously it isn't using ASCII values.


On closer inspection, this is actually the TrueType glyph IDs of the required 
glyph sequence. I used ttfdump to get the character codes according to the
embedded 3,1 CMAP subtable from the font, which maps as follows:

		Which Means:
		   1. Char 0020 -> Index 3
...
		      Char 0041 -> Index 36 (0x24)
		      Char 0042 -> Index 37 (0x25)
		      Char 0043 -> Index 38 (0x26)
		      Char 0044 -> Index 39 (0x27)
		      Char 0045 -> Index 40 (0x28)
		      Char 0046 -> Index 41 (0x29)
		      Char 0047 -> Index 42 (0x2a)
		      Char 0048 -> Index 43 (0x2b)
		      Char 0049 -> Index 44 (0x2c)
		      Char 004A -> Index 45 (0x2d)
		      Char 004B -> Index 46 (0x2e)
		      Char 004C -> Index 47 (0x2f)
		      Char 004D -> Index 48 (0x30)
		      Char 004E -> Index 49 (0x31)
		      Char 004F -> Index 50 (0x32)
		      Char 0050 -> Index 51 (0x33)
		      Char 0051 -> Index 52 (0x34)
		      Char 0052 -> Index 53 (0x35)
		      Char 0053 -> Index 54 (0x36)
		      Char 0054 -> Index 55 (0x37)
		      Char 0055 -> Index 56 (0x38)

If we treat the PDF sequence as if they were glyph IDs, and use the CMAP in
reverse, to find out which character IDs map to those glyph IDs we see that:

GID     CID     ASCII
---------------------
0x27    0x44    D
0x26    0x43    C
0x03    0x20    'space'
0x35    0x52    R
0x24    0x41    A
0x37    0x54    T
0x2C    0x49    I
0x31    0x4E    N
0x2A    0x47    G


So, it seems that when Acrobat gets a Type 2 CIDFont, for Identity mappings at
least, the 'text' values are direct TrueType GIDs (glyph IDs) and should be used
to index the fpgm table directly. What happens for us at the moment seems to be 
that we use the CMAP (or some other mapping) to convert the PDF glyph sequence
from CID to GID, and then use the resulting GID to index the fpgm table.

As a sanity check I modified the font, removing the POST table, and altering 
both the 1,0 and 3,1 CMP subtables. The font shows different glyphs under
Windows, but Acrobat displays the correct text, showing that it is not using 
the CMAP or POST subtables, and must therefore be using the CIDs as GIDs directly.


Its fairly obvous (I think) that this is exactly the same problem as Bug
#689623, just as Leonardo already stated (comment #18 and #22). If I apply the
same 'use CID as GID' to the file and text in that report, I get the correct
result. In fact my own investigation came to very nearly the correct conclusion,
I simply failed to check what happens when the font CMAP subtable is altered,
which I've done this time and gives the missing piece of information.


Please note, I believe this only applies when the font is a CIDFont, and may
well only apply when the CMap is an Identity. I have no idea what will happen if
it isn't.

I realise I may not have explained this well, please ask if anything is not clear...
Comment 26 Marcos H. Woehrmann 2008-12-29 15:52:24 UTC
The file HiraMaruPro.pdf has been moved to a separate bug: 690214.
Comment 27 Ralph Giles 2009-06-02 11:49:14 UTC
Looking at the error message from Ray's comment #8, "Can't build
/Identity.Unicode /CIDDecoding resource. See gs_ciddc.ps ." I do see that
there's no Identity.Unicode key the .CMapChooser dictionary. Adding that, or
harmonizing whatever is setting that, is a place to start.
Comment 28 Ken Sharp 2009-06-30 05:17:46 UTC
*** Bug 689623 has been marked as a duplicate of this bug. ***
Comment 29 Ken Sharp 2009-07-01 01:53:36 UTC
*** Bug 690483 has been marked as a duplicate of this bug. ***
Comment 30 Ken Sharp 2009-07-03 02:45:23 UTC
This issue is addressed in revision 9834, patch here:

http://ghostscript.com/pipermail/gs-cvs/2009-July/009513.html

This works with TrueType fonts which have a Unicode CMAP table, and relies on
the addition of fonts in cidfmap with an Identity Ordering.

The cidfmap entries for the various issues relying on this:

688515
-------
For solidworks.pdf & PDFwith Missing CIDFont.pdf
/CenturyGothic << /CSI [(Identity) 0] /Path (c:/tests/688515/gothic.ttf)
/SubfontID 0 /FileType /TrueType >> ;

For fax.pdf
/Arial << /CSI [(Identity) 0] /Path (c:/windows/fonts/arial.ttf) /SubfontID 0
/FileType /TrueType >> ;

689499
-------
/CourierNew << /CSI [(Identity) 0] /Path (c:/windows/fonts/cour.ttf) /SubfontID
0 /FileType /TrueType >> ;

688813
-------
I don't have a copy of the font 'MSSansSerif', so I created an entry using Arial
as a substitute. Naturally the output is incorrect since the GIDs for Arial will
be utterly different. However I believe that using the correct font will work
properly.

/MSSansSerif << /CSI [(Identity) 0] /Path (c:/windows/fonts/arial.ttf)
/SubfontID 0 /FileType /TrueType >> ;

689623
-------
904259_Faktura_23082.pdf & 120107_PO.pdf
/Arial << /CSI [(Identity) 0] /Path (c:/windows/fonts/arial.ttf) /SubfontID 0
/FileType /TrueType >> ;

3661749.PDF
/TimesNewRomanPSMT << /CSI [(Identity) 0] /Path (c:/windows/fonts/times.ttf)
/SubfontID 0 /FileType /TrueType >> ;

/TimesNewRomanPS-BoldMT << /CSI [(Identity) 0] /Path
(c:/windows/fonts/timesbd.ttf) /SubfontID 0 /FileType /TrueType >> ;

64_384_6_180_5a_682822_361223.pdf
/Arial << /CSI [(Identity) 0] /Path (c:/windows/fonts/arial.ttf) /SubfontID 0
/FileType /TrueType >> ;

/CenturyGothic << /CSI [(Identity) 0] /Path (c:/tests/688515/gothic.ttf)
/SubfontID 0 /FileType /TrueType >> ;

690483
-------
/RotisSerif << /CSI [(Unicode) 0] /Path (c:/tests/688515/rg.ttf) /SubfontID 0
/FileType /TrueType >> ;

/RotisSansSerif << /CSI [(Unicode) 0] /Path (c:/tests/688515/rb.ttf) /SubfontID
0 /FileType /TrueType >> ;

Comment 31 Marcos H. Woehrmann 2011-09-18 21:47:12 UTC
Changing customer bugs that have been resolved more than a year ago to closed.