689623 – Issue with CID font mapping

Bug 689623 - Issue with CID font mapping

Summary: Issue with CID font mapping

Status:	NOTIFIED DUPLICATE of bug 688515

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	Text (show other bugs)
Version:	master
Hardware:	PC Linux

Importance:	P2 normal
Assignee:	Ken Sharp

URL:
Keywords:

Depends on:
Blocks:

Reported:	2007-12-20 12:43 UTC by Marcos H. Woehrmann
Modified:	2011-09-18 21:47 UTC (History)
CC List:	3 users (show)

See Also:
Customer:	580 384 670
Word Size:	---

Attachments
reduced-uncompressed.pdf (18.81 KB, application/pdf) 2008-05-23 07:16 UTC, Ken Sharp	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Marcos H. Woehrmann 2007-12-20 12:43:16 UTC

The customer reported that the attached PDF file cannot be read by Ghostscript
8.60, instead generating an error:

  Substituting CID font resource/Adobe-Identity for /Arial.
  Error: /undefinedresource in findresource

I installed a copy of arial.ttf onto my computer and added an entry to cidfmap:

/Arial << /FileType /TrueType /Path
(/home/marcos/Desktop/artifex/leadtools/arial.ttf) /SubfontID 0
/CSI [(Unicode) 0] >> ;

This removes the error but does not result in the correct characters being
displayed.  Apple Preview and Adobe Acrobat on my iMac and evince on my Linux
box all display the file the same why so I'm assuming they are correct.

Looking at properties under Acrobat it appears that the encoding should be
Identity-H and if I'm understanding gs_ciddc.ps correctly Unicode.Unicode is
using Identity-UTF16-H instead.  However, modifying gs_ciddc.ps doesn't improve
the results.

Using gs head (r8452) doesn't change anything.

Comment 1 Marcos H. Woehrmann 2007-12-20 12:44:28 UTC

Leonardo suggests:

Likely this issue is another Adobe's undocumented feature.
The document uses an instandard encoding like this :

<0003> <0020>
<0004> <0021>
<0005> <0022>
<0006> <0023>
<0007> <0024>
<0008> <0025>

And so on. It is listed in ToUnicode CMap in the document.
But Adobe never defined that ToUnicode is used for rendering,
and I believe it does not.
When I modify ToUnicode, Adobe renders it same.

Here is a text from the document and its encoding :

     M     o     n       t       a      g      e      v       e      j
6
<00300052005100570044004A004800590048004D00030019>Tj

The font you attached is from Windows XP.
There is nothing special in it.
Here is encodings it defines :

--CMAP-offset=0000D1C4------------------------
 nVersion=0  nTables=3
 nPlatformId=0  nSpecificId=3 (Unichar)  pos=0000D1E0  nFormat=0004
 nPlatformId=1  nSpecificId=0 (Macintosh Roman)  pos=0000DD04  nFormat=0000
 nPlatformId=3  nSpecificId=1 (Windows Unicode)  pos=0000DE0A  nFormat=0004

I guess it interpretes character codes as glyph indices.
Need to check for sure.
I tried to insert /CIDToGIDMap/Identity
but is doesn't help for Ghostscript.

I think we should open an enhancement in bugzilla,
assign it to Toshiya and put this my comment to there.
Likely /CIDToGIDMap/Identity should be used
in this case, but need to figure out in what circumstances
it has to be used.

I noticed Acrobat Reader 4, 5 renders it as Ghostscript.
Sorry I don't have 6 installed.
7 and 8 render the document with the right text.
So likely Adobe behavior changed recently.

Comment 2 Marcos H. Woehrmann 2007-12-20 12:44:43 UTC

Created attachment 3651 [details]
904259_Faktura_23082.pdf

Comment 3 Marcos H. Woehrmann 2007-12-20 12:45:02 UTC

Created attachment 3652 [details]
arial.ttf

Comment 4 Ken Sharp 2008-05-23 02:35:10 UTC

I've had a look at this, and I have to admit to being somewhat baffled. I
reduced the document to a single word 'Faktura', which has the string:

<00290044004E0057005800550044>Tj

The CIDFont says it has an Identity-H Encoding, so treating the 2-byte CIDs as
if they were ASCII we get ')DNWXUD', which is what GS displays. As Leonardo
says, the font does contain a ToUnicode CMap, which maps the glyphs to

<00460061006B0074007500720061>

Again, converting to ASCII gives 'Faktura'. So it would seem Acrobat is using
the ToUnicode CMap. However, I then removed the ToUnicode CMap, and Acrobat
*still* displays the expected text.

It can't be using ToUnicode, because its not there...

The only thing I can think of is that Acrobat is using the font's own TrueType
CMAP table. I do note that the glyph positions in the CMAP tables in the font
correspond to the ToUnicode values in the ToUnicode CMap. That is GID 3 maps to
CID 0x20 and so on.

Since I'm completely in the dark with respect to how GS maps a TrueType font to
a CIDFont, I'm unable to decide if this is helpful or not....

For what its worth, Jaws renders this the same as GS.

Comment 5 Marcos H. Woehrmann 2008-05-23 07:08:35 UTC

Please upload the test file without the ToUnicode CMap referred to in comment
#4; I'd like to confirm that evince can open it correctly.  Since evince is open
source it should be possible to see what it does to deal with the mysterious
mapping issue.

Comment 6 Ken Sharp 2008-05-23 07:16:52 UTC

Created attachment 4043 [details]
reduced-uncompressed.pdf

As requested, reduced file (now only contains the word 'Faktura'), no ToUnicode
CMap. Still displays correctly in Acrobat 7&8.

I suspect that Acrobat is using one of the TrueType CMAP subtables, treating
the 2-byte codes in the font as a CID and using (probably) the 3,1 Unicode CMAP
subtable in the font to convert to a GID.

I do note that the ToUnicode CMap in the original file duplicates the entries
in the 3,1 CMAP subtable.

There is some information in the TT spec about using TrueType tables, but I
thought this applied only to simple fonts, not CIDFotns.

Comment 7 Marcos H. Woehrmann 2008-05-23 07:35:35 UTC

evince displays the reduced-uncompressed-pdf correctly.

Comment 8 leonardo 2008-05-23 23:48:49 UTC

I think now Alex should take it to check whether we can change the glyph 
mapping in lib/gs_ttf.ps . Maybe such unusual mapping should be optional. 
Assigning to Alex.

Comment 9 Marcos H. Woehrmann 2008-06-11 12:55:55 UTC

Created attachment 4090 [details]
64_384_6_180_5a_682822_361223.pdf

I believe this file has the same problem as the original; Ghostscript displays:


  Substituting CID font resource/Adobe-Identity for /CenturyGothic.
  Error: /undefinedresource in findresource

when opening it.

Comment 10 Marcos H. Woehrmann 2008-07-09 07:07:10 UTC

Created attachment 4195 [details]
3661749.PDF

Another file, from a different customer, exhibiting the same symptoms.

Comment 11 Marcos H. Woehrmann 2008-07-09 08:26:57 UTC

This bug is related to bug 689956.

Comment 12 Marcos H. Woehrmann 2008-08-19 11:50:09 UTC

Created attachment 4291 [details]
120107_PO.pdf

Another file, from a different customer, that shows the same error:

Substituting CID font resource/Adobe-Identity for /Arial.
Error: /undefinedresource in findresource

Comment 13 Rowland Gosling 2008-10-07 14:43:26 UTC

We just solved our problem with Arial.y. Turns out GS doesn't like the Semi-Bold
attribute in Microsoft SQL Server Reporting Services. We set the font to plain
'Arial' and everything works swimmingly.

I suspect a number of reports I'm seeing may have a similar problem.

Comment 14 Alex Cherepanov 2008-12-16 10:04:54 UTC

The observation that the bug depends on the selected fonts may help
to narrow the problem.
We also need to look into approach suggested by the comment #8.

Comment 15 Ken Sharp 2009-06-30 05:17:46 UTC


*** This bug has been marked as a duplicate of 688515 ***

Comment 16 Marcos H. Woehrmann 2011-09-18 21:47:30 UTC

Changing customer bugs that have been resolved more than a year ago to closed.