The customer reported that the attached PDF file cannot be read by Ghostscript 8.60, instead generating an error: Substituting CID font resource/Adobe-Identity for /Arial. Error: /undefinedresource in findresource I installed a copy of arial.ttf onto my computer and added an entry to cidfmap: /Arial << /FileType /TrueType /Path (/home/marcos/Desktop/artifex/leadtools/arial.ttf) /SubfontID 0 /CSI [(Unicode) 0] >> ; This removes the error but does not result in the correct characters being displayed. Apple Preview and Adobe Acrobat on my iMac and evince on my Linux box all display the file the same why so I'm assuming they are correct. Looking at properties under Acrobat it appears that the encoding should be Identity-H and if I'm understanding gs_ciddc.ps correctly Unicode.Unicode is using Identity-UTF16-H instead. However, modifying gs_ciddc.ps doesn't improve the results. Using gs head (r8452) doesn't change anything.
Leonardo suggests: Likely this issue is another Adobe's undocumented feature. The document uses an instandard encoding like this : <0003> <0020> <0004> <0021> <0005> <0022> <0006> <0023> <0007> <0024> <0008> <0025> And so on. It is listed in ToUnicode CMap in the document. But Adobe never defined that ToUnicode is used for rendering, and I believe it does not. When I modify ToUnicode, Adobe renders it same. Here is a text from the document and its encoding : M o n t a g e v e j 6 <00300052005100570044004A004800590048004D00030019>Tj The font you attached is from Windows XP. There is nothing special in it. Here is encodings it defines : --CMAP-offset=0000D1C4------------------------ nVersion=0 nTables=3 nPlatformId=0 nSpecificId=3 (Unichar) pos=0000D1E0 nFormat=0004 nPlatformId=1 nSpecificId=0 (Macintosh Roman) pos=0000DD04 nFormat=0000 nPlatformId=3 nSpecificId=1 (Windows Unicode) pos=0000DE0A nFormat=0004 I guess it interpretes character codes as glyph indices. Need to check for sure. I tried to insert /CIDToGIDMap/Identity but is doesn't help for Ghostscript. I think we should open an enhancement in bugzilla, assign it to Toshiya and put this my comment to there. Likely /CIDToGIDMap/Identity should be used in this case, but need to figure out in what circumstances it has to be used. I noticed Acrobat Reader 4, 5 renders it as Ghostscript. Sorry I don't have 6 installed. 7 and 8 render the document with the right text. So likely Adobe behavior changed recently.
Created attachment 3651 [details] 904259_Faktura_23082.pdf
Created attachment 3652 [details] arial.ttf
I've had a look at this, and I have to admit to being somewhat baffled. I reduced the document to a single word 'Faktura', which has the string: <00290044004E0057005800550044>Tj The CIDFont says it has an Identity-H Encoding, so treating the 2-byte CIDs as if they were ASCII we get ')DNWXUD', which is what GS displays. As Leonardo says, the font does contain a ToUnicode CMap, which maps the glyphs to <00460061006B0074007500720061> Again, converting to ASCII gives 'Faktura'. So it would seem Acrobat is using the ToUnicode CMap. However, I then removed the ToUnicode CMap, and Acrobat *still* displays the expected text. It can't be using ToUnicode, because its not there... The only thing I can think of is that Acrobat is using the font's own TrueType CMAP table. I do note that the glyph positions in the CMAP tables in the font correspond to the ToUnicode values in the ToUnicode CMap. That is GID 3 maps to CID 0x20 and so on. Since I'm completely in the dark with respect to how GS maps a TrueType font to a CIDFont, I'm unable to decide if this is helpful or not.... For what its worth, Jaws renders this the same as GS.
Please upload the test file without the ToUnicode CMap referred to in comment #4; I'd like to confirm that evince can open it correctly. Since evince is open source it should be possible to see what it does to deal with the mysterious mapping issue.
Created attachment 4043 [details] reduced-uncompressed.pdf As requested, reduced file (now only contains the word 'Faktura'), no ToUnicode CMap. Still displays correctly in Acrobat 7&8. I suspect that Acrobat is using one of the TrueType CMAP subtables, treating the 2-byte codes in the font as a CID and using (probably) the 3,1 Unicode CMAP subtable in the font to convert to a GID. I do note that the ToUnicode CMap in the original file duplicates the entries in the 3,1 CMAP subtable. There is some information in the TT spec about using TrueType tables, but I thought this applied only to simple fonts, not CIDFotns.
evince displays the reduced-uncompressed-pdf correctly.
I think now Alex should take it to check whether we can change the glyph mapping in lib/gs_ttf.ps . Maybe such unusual mapping should be optional. Assigning to Alex.
Created attachment 4090 [details] 64_384_6_180_5a_682822_361223.pdf I believe this file has the same problem as the original; Ghostscript displays: Substituting CID font resource/Adobe-Identity for /CenturyGothic. Error: /undefinedresource in findresource when opening it.
Created attachment 4195 [details] 3661749.PDF Another file, from a different customer, exhibiting the same symptoms.
This bug is related to bug 689956.
Created attachment 4291 [details] 120107_PO.pdf Another file, from a different customer, that shows the same error: Substituting CID font resource/Adobe-Identity for /Arial. Error: /undefinedresource in findresource
We just solved our problem with Arial.y. Turns out GS doesn't like the Semi-Bold attribute in Microsoft SQL Server Reporting Services. We set the font to plain 'Arial' and everything works swimmingly. I suspect a number of reports I'm seeing may have a similar problem.
The observation that the bug depends on the selected fonts may help to narrow the problem. We also need to look into approach suggested by the comment #8.
*** This bug has been marked as a duplicate of 688515 ***
Changing customer bugs that have been resolved more than a year ago to closed.