Bug 689754 - Substituting .notdef for "german umlauts" when generating PDFA
Summary: Substituting .notdef for "german umlauts" when generating PDFA
Status: NOTIFIED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: 8.60
Hardware: PC Windows XP
: P2 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-03-18 05:23 UTC by artifex
Modified: 2008-12-19 08:31 UTC (History)
0 users

See Also:
Customer: 870
Word Size: ---


Attachments
Substituting .notdef for german umlauts (242.04 KB, application/postscript)
2008-03-18 05:25 UTC, artifex
Details

Note You need to log in before you can comment on or make changes to this bug.
Description artifex 2008-03-18 05:23:53 UTC
I converted the attached Postscript-file to PDF/A with following command:
 
gs -dNOPAUSE -dBATCH -sOutputFile=umlaut.pdf -sDEVICE=pdfwrite -dPDFA umlaut.ps

When interpreting the result, the GhostScript shows messages like this:
Substituting .notdef for odieresis in the font Calibri 
Substituting .notdef for udieresis in the font Calibri
....

The german umlauts are missing!

In the function pdf_convert_true_font ( gdevpdtf.c ) there is an optimization at
line 1010. This optmization converts fonts only if the encoding is different. If
this optmization is removed, the problem is solved.
Comment 1 artifex 2008-03-18 05:25:17 UTC
Created attachment 3872 [details]
Substituting .notdef for german umlauts
Comment 2 Ken Sharp 2008-03-20 04:04:00 UTC
Something funny related to 'symbolic' TrueType fonts here. If we don't set PDF/A
output then the font is emitted with flags indicating its a 'symbolic' font, and
the PDF file works. 

This 'hack' (comment describes it as such) in the code is specifically disabled
for PDF/A output, so the font is emitted with flags indicating a 'Roman' font
inatead. In this case, the font does not work properly, even though it is
compatible with a WinAnsiEncoding.

Weird. I'll need to go and read up on the meaning of the symbolic flag again.

Disabling the Encoding check causes the font to be emitted as a CIDFont with an
Identity-H CMap, which is why that works. However it does create a larger output
file. Must check the recommendations in the PDF/A specification as well.
Comment 3 Ken Sharp 2008-03-21 08:21:17 UTC
Although the FontDescriptor flags are emitted as 'Roman' rather than 'symbolic'
for PDF/A, the cmap table in the TrueType font is still set to 3,0 which is
Windows, symbolic.

This seems to be the basic problem, if I alter this to 3,1 (Windows, Unicode)
the file works correctly. Altering the FontDescriptor flags to symbolic to match
the font works as well. Of course emitting the font as a CIDFont (customer
suggestion) works too, though the output is larger.

So I now have three possible ways to fix this ;-) I've asked Leonardo for an
opinion.
Comment 4 Ken Sharp 2008-04-04 01:57:54 UTC
This patch:

http://ghostscript.com/pipermail/gs-cvs/2008-April/008208.html

resolves the problem.