Bug 689597 - PDF created with NoEmbed has wrong BaseFont name
Summary: PDF created with NoEmbed has wrong BaseFont name
Status: NOTIFIED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: master
Hardware: All All
: P2 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-12-06 09:26 UTC by Ray Johnston
Modified: 2008-12-19 08:31 UTC (History)
1 user (show)

See Also:
Customer: 1
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Ray Johnston 2007-12-06 09:26:00 UTC
The attached PS file has embedded subset TTF's. The customer wants to be able to
create PDF's without the fonts embedded so that the PDF file size will be smaller.

When Ghostscript converts this to PDF normally (-dEmbedAllFonts=true), the
OrigFontName gets used as the BaseFont when subset font is put into the PDF (as
expected).

When the distillerparam -dEmbedAllFonts=false, the PDF is much smaller (as
desired) but the font names things like "F0" that Adobe Acrobat Reader can't
substitute for, so just 'dots' are displayed.

In the PDF's with EmbedAllFonts=true we see:
%Resolving: [8 0]
<<
/BaseFont /RABYKY+TimesNewRoman /FontDescriptor 9 0 R
/Type /Font /FirstChar 1 /LastChar 51 /Widths [
722 500 500 444 500 278 556 722 667 611 556 611 333 250 556 500 500 250 500 500
722 611 667 722 889 667 722 250 500 500 500 500 722 333 722 722 500 444 500 833
722 500 444 333 389 500 278 500 278 444 500 ]
/Encoding 23 0 R
/Subtype /TrueType >>
endobj
%Resolving: [9 0]
<<
/Type /FontDescriptor /FontName /RABYKY+TimesNewRoman /FontBBox [-12 -195 868 694 ]
/Flags 4 /Ascent 694 /CapHeight 694 /Descent -195 /ItalicAngle 0 /StemV 130
/MissingWidth 777 /FontFile2 19 0 R
>>
-------------------------------------------------------------------
In the above case, the name is being picked up from the OrigFontName key in
the FontInfo directory created by the PS:
AddFontInfoBegin
AddFontInfo
/OrigFontType /TrueType def 
/OrigFontName <54696D6573204E657720526F6D616E> def
/OrigFontStyle () def
/FSType 0 def
AddFontInfoEnd
-------------------------------------------------------------------------
With EmbedAllFonts=false we see:
%Resolving: [8 0]
<<
/BaseFont /F0 /FontDescriptor 9 0 R
/Type /Font /FirstChar 1 /LastChar 51 /Widths [
722 500 500 444 500 278 556 722 667 611 556 611 333 250 556 500 500 250 500 500
722 611 667 722 889 667 722 250 500 500 500 500 722 333 722 722 500 444 500 833
722 500 444 333 389 500 278 500 278 444 500 ]
/Encoding 19 0 R
/Subtype /TrueType >>
endobj
%Resolving: [9 0]
<<
/Type /FontDescriptor /FontName /F0 /FontBBox [-12 -195 868 694 ]
/Flags 4 /Ascent 694 /CapHeight 694 /Descent -195 /ItalicAngle 0 /StemV 130
/MissingWidth 777 >>
endobj
-------------------------------------------------------------

What is needed is to use the OrigFontName as the /BaseFont value similarly to
the way it is used when embedding the subset.
Comment 1 Ray Johnston 2007-12-06 09:28:25 UTC
Created attachment 3614 [details]
test.ps

The two different cases can be generated using this file using:

gswin32c -sDEVICE=pdfwrite -o fontsembedded.pdf -dEmbedAllFonts=true test.ps

   and

gswin32c -sDEVICE=pdfwrite -o fontsNOTembedded.pdf -dEmbedAllFonts=false
test.ps

Then just look at the two PDF's with Acrobat Reader.
Comment 2 Ray Johnston 2007-12-11 02:03:12 UTC
Fixing bug priority to customer level. Sorry for the confusing, Ken.

Note that when in doubt, check Joann's customer list info. This customer
is under FULL support.
Comment 3 Ken Sharp 2007-12-11 02:11:14 UTC
Not a problem Ray, I did check the list, which is why I queried the priority,
didn't seem high enough to me ;-)

Since I'm awaiting input from yourself and Ralph on the JBIG2 issue, I've
started in on this one anyway.

Comment 4 Ken Sharp 2007-12-11 03:57:47 UTC
Hmmm. I'm afraid there's a great deal more to this than the font name.

The font which is downloaded in the job has a custom, non-standard, encoding.
Simply using the original font name will cause Acrobat to substitute a font
available on the host PC. In my case TimesNewRomanPSMT and friends.

However, the display is garbage. This seems to be because we are using a custom
encoding, because the original job used a custom encoding for the downloaded
font. So we end up using glyphs starting from encoding position 0 of the
/Encoding array.

We *do* embed an Encoding in the FontDescriptor, and it contains a /Differences
array. It looks like Acrobat is ignoring that, and simply displaying whatever
would be present in a WinAnsiEncoding. Or, possibly, has no idea what to do with
an Encoding array applied to a native TrueType font from disk, since TrueType
fonts don't have glyph names...


For example, here's the first line of text from the GS PDF file:

[()2.14672()-1.83244()9.07697()-3.43849()-1.83244()277.833]TJ

(The odd characters are 0x01, 0x02, 0x03, 0x04, 0x05 and 0x06 in ASCII)

From Acrobat:

[(Adv)9.9(e)-2.6(nt)]TJ

And the original PostScript:

2415 682 M <010203040506>[66 46 45 41 46  0]xS 

Notice that the indices 01, 02, 03, 04, 05 and 06 have magically transformed in
the Acrobat output to the ASCII values A, d, v, e, n and t.

Its not obvious to me how Acrobat has managed that. My guess is that its
something to do with the glyph name supplied to AddT42Char, eg:

/TTE35D7008t00 AddT42Char

Since this is from an Adobe PostScript driver, I guess Adobe have some means of
working back from that hex value to a glyph name. They then presumably decide
whether the total glyphs in the font can be represented in a standard encoding,
and if so use it. I note that the Distiller produced PDF file uses /WinAnsiEncoding.

So I'm not sure what to do about this. I could modify pdfwrite to use the
original font name fairly easily. Probably I should do that, as at least missing
fonts with sensible encodings would work as expected.

However, it will simply replace the customer's problem with a different one. One
which is arguably not a bug (identifying the glyphs and remapping to a standard
encoding isn't anywhere in the spec) but a feature, and which will be quite hard
to implement, I suspect, especially since there is no documentation on what the
argument to AddT42Char means.

CC'ed Igor on this, as I'd like his (more experienced) opinion.

Comment 5 Ken Sharp 2007-12-11 03:58:59 UTC
Oops, didn't mean to close the bug, sorry folks.
Comment 6 Ken Sharp 2007-12-11 04:09:12 UTC
OK, I did some more looking and it seems that the glyph names *are* present and
correct, in the fonts Encoding.

So 'all' we would need to do is check to see that all the named glyphs are
present in a standard Encoding, emit the font with a standard Encoding, and
remap all the text strings to use the standard Encoding rather than the custom one. 

I'm not sure this is possible, because we emit the strings before accumulating
all the glyphs. So we would need to modify and emit the strings using a standard
encoding before we know whether use of a standard encoding is possible.

I guess we could remap all the glyphs we can, and map the glyphs that don't fit
into unused portions of the encoding as we go. If we run out of unassigned
places we could then use up standard positions. This would mean that 'some' text
would work, but I'm not sure this is worth the effort.

Is this a bug fix or a feature request ? 

Comment 7 Ray Johnston 2007-12-11 10:16:49 UTC
Since we don't do as well as Adobe, it is probably a bug.

There is a workaround of embedding the fonts.
Comment 8 Ken Sharp 2007-12-11 12:09:27 UTC
Umm, I'd be inclined to call it a feature myself. The PDF file produced has the
same information as the original PostScript, the glyphs are encoded the same way.

We are doing what's been requested, the fact that the font has been re-encoded
in a way that makes it invalid for a simple substitution isn't our fault ;-)

In fact, Acrobat seems to remap the font and text to WinAnsi 
Indeed, Ghostscript can even substitute the missing fonts and print the result
'correctly'. It seems that the fact that the font is flagged as symbolic is the
reason Acrobat ignores the encoding. But if I remove that, I get a different
error, I believe because we are using unusual numbers in the Encoding array.

I do agree that not including the original font name is a bug, that makes
substitution impossible even for a font which has not been re-encoded.

I suggest that I address that issue here, and open a new tracking number with a
feature request to remap fonts to a standard Encoding. I think this should be
controlled with a switch.

Opinions anyone ?

Comment 9 leonardo 2007-12-12 02:46:24 UTC
A long ago we desided to do no reencoding due to multiple problems with 3d 
party fonts. One example is when standard glyph names are used with instandard 
glyphs. Note in this example a right rendering isn't possible with no embedding 
the font, or when the embedded font is substituted by the viewer. Note Adobe 
does not guarantee a right result when using 3d party fonts or documents.

I think the right way for the test case is to make Adobe to use the encoding we 
generate. The problem with Symbolic flag is terrific. A long ago we wanted to 
change to setting a right symbolic flag, but we had no time for deep 
investigation. I suggest Ken to spend some time to investigate how to set the 
right symbolic flag, and how to process it correctly in our PDF interpreter. I 
very suspect that our PDF interpreter is bug-to-bug compatible to our writer 
about the Symbolic flag.

Another suspicious thing is what cmap do we include into the embedded font. 
Ken, please check the following : if replace char codes with glyph names by the 
Encoding generated by Ghostscript, and then replace glyph names with glyphs by 
the PDF specification, will we get the right text ?
Comment 10 Ken Sharp 2007-12-12 07:45:34 UTC
Leo, thanks for your comments. I could see that the absence of re-encoding was
deliberate, but it is something which Distiller does. I guess we will be
criticised if we don't also do this.

However it is quite risky as you rightly say! One good reason to make it a
switch, if we implement it. I am also coming to a fairly firm belief that this
(re-encoding) should be treated as an enhancement request. It does seem that not
using the original font name should be a bug though, do you think ?


The symbolic flag; according to the spec, if this flag is set the font contains
glyphs outside the Adobe Latin 1 character set. Since the font contains glyphs
starting at Encoding position 1, this is clearly the case, and it is quite
correct that this flag is set by pdfwrite, in this instance at least.

The only way we could realistically *not* set the flag would be to re-encode the
font. Clearing the flag (in a binary editor) causes Acrobat to complain that the
font has invalid flags, which is true.

It appears that Ghostscript doesn't care what the sybmolic setting is and uses
the Encoding anyway. Acrobat won't use the Encoding if the font is flagged
Symbolic, and will complain if the font is not flagged symbolic, but uses
non-latin 1 character encodings.

There doesn't seem to be any way, therefore, to make Acrobat use the Encoding
without re-encoding the font. If we are going to re-encode the font then we may
as well do it the same as Distiller does.


The font doesn't include a cmap, because its a type 42.(ie /SubType /TrueType).
NB Ghostscript does complain about the PDF file if the fonts are not embedded,
pointing out that fonts of type /TrueType should be embedded.


I'm not sure what the last thing you are asking me to check is. I do intend to
meddle with the glyph names in the PostScript, and see what Distiller does if
one of them is not a standard glyph name. 

[later] 
Hmm, interesting. That generates two fonts called TimesNewRoman. One a TrueType
with a WinAnsi encoding, the other a type 2 CID (CIDFont with type 42 outlines)
with a (2-byte) Identity-H encoding...

Not only that, but Distiller embeds the two fonts, even though I've set
EmbedAllFonts to false, and specifically excluded the fonts in question from
embedding by adding to the never embed list.

To me it looks like Distiller re-encodes the fonts to Ansi whenever it can, and
if it can't (for TrueType fonts, anyway) it embeds the font regardless of the
settings of the embed flags.


If the decision has already been taken not to do re-encoding of fonts, then I
think there is no way to achieve what the customer wants, which is a file which
 contains no fonts, but which will display correctly in Acrobat, starting with a
PostScript file which contains re-encoded subset TrueType fonts.


Again, opinions sought. Specifically should we do the re-encoding work, and is
it (my contention) an enhancement not a bug fix.

Comment 11 leonardo 2007-12-12 13:33:14 UTC
Ken,

> I guess we will be criticised if we don't also do this.

There were multiple attempts to creticise that in last 4 years, but all them 
were dismissed with fixing minor bugs. I think this bug is such. If I am the 
owner, I would consider this way as the very last after all other atttempts 
fail.

> Since the font contains glyphs starting at Encoding position 1, this is 
clearly the case

Does the encoding position 1 define a glyph name from outside Latin 1 glyph 
names ? I believe that char codes are not important, only glyphs have sence for 
Symbolic.

> Clearing the flag (in a binary editor) causes Acrobat to complain

There are 2 flags : Symbolic and Non Symbolic. If both are unset, Adobe claims 
invalid font. When you reset one, you must set the other.

> Acrobat won't use the Encoding if the font is flagged Symbolic,

Ahh it may depend on wheather the font is embedded or not, on available cmap 
subtables. Ghostscript maps through Encoding, then through 'post' then through 
cmap. Adobe (I hguess here) may skip post and use Adobe Glyph List instead, or 
something else. They document that they do use hewristics, but never said how 
the hewristics work. To go out this bog first of all I would check whether Non 
Symbolic flag works for this font. 

> The font doesn't include a cmap, because its a type 42

I meant the 'cmap' subtable of True Type format. It must be included when a 
True Type font is embedded into PDF.

> pointing out that fonts of type /TrueType should be embedded.

Right I implemented this warning following the PDF spec. Here it does inform us 
that we're in the undocumented bog.

> Not only that, but Distiller embeds the two fonts

Another Undocumented is how NeverEmbed must control CID fonts. You said the 
second font is CID, so...

> To me it looks like Distiller re-encodes the fonts to Ansi whenever it can

Well I recommend to take a Type 1 font with Standard Encoding, exchange 2 glyph 
names in CharStrings and in Encoding, then distill with NeverEmbed. I think 
Adobe will give a wrong rendering. I'm saying this because the predicate "it 
can reencode" isn't algorithmically soluble". So we're again the the 
undocumented hewristic bog.

> then I think there is no way to achieve what the customer wants

Try Non Symbolic flag.
Comment 12 leonardo 2007-12-12 13:36:07 UTC
s/creticise/criticize
Comment 13 leonardo 2007-12-12 13:44:50 UTC
BTW, if Adobe ignores Encoding with Symbolic, then symbolic font must be 
embedded if it contains an instandard glyph (with no regard what 
does 'instandard' mean exactly). Then either the customer's font is NonSymbolic 
(all glyphs are standard) or the customer's case is an incorrect usage of a PDF 
converter.
Comment 14 Ken Sharp 2007-12-13 01:12:50 UTC
> Does the encoding position 1 define a glyph name from outside Latin 1 glyph 
> names ? I believe that char codes are not important, only glyphs have sence for 
> Symbolic.

Re-reading the spec, I think you are correct, it only mentions the glyph names,
not their encoded positions. In this case the font only contains glyphs which
are in the latin 1 character set (but their Encoding is different).

So I guess the font should be Nonsymbolic.


> There are 2 flags : Symbolic and Non Symbolic. If both are unset, Adobe claims 
> invalid font. When you reset one, you must set the other.

Yes, I tried it both ways and got the same result; "The font /F0 contains bad
/Flags" I've no idea why I'm afraid. AH! I just checked the Distiller output,
and I made a mistake. Bit 5 is unused, the Nonsymbolic bit is actually bit 6. If
I set the /Flags to 32 instead of 16, the PDF output from GS appears 'correct'
in Acrobat. 

At the moment its substituting the wrong fonts, because I haven't implemented
the fix for font names, with the result that its using Adobe Sands MM to
substitute. Since I haven't set the italic flag the italics have all gone, but
all the glyphs are correct, including the modified encoding. So, good news!


OK, I think I see the way forward (thanks to Leo here) as: Fix the fontname, for
which I have a trivially simple patch, and then work on getting the Symbolic
flag correct when writing out the Font Descriptor.

This should address the specific customer issue. 

As a possible additional feature, I could opt to ignore the setting of the embed
flag, if the font does turn out to be 'Symbolic', and go ahead and embed the
font anyway. Its not exactly true to the description of the flag, but it is
pretty much what Distiller does, so I think its reasonable for us to do so too.

Any dissenting opinions ? Should I *not* attempt to embed the font under these
conditions ?
Comment 15 leonardo 2007-12-13 11:21:01 UTC
I read Comment #14 and I think I can't add anything useful now.
Comment 16 Ray Johnston 2007-12-15 10:42:00 UTC
I think that always embedding the font (Subset) when it includes a glyph
outside of the Latin 1 set (implying that we need to flag the font as
Symbolic) is a reasonable fallback.

The rest sounds like good news indeed.
Comment 17 Ken Sharp 2007-12-16 12:25:41 UTC
Embedding the font seems to happen already (somewhat to my surprise), it seems
to be one of the features which was disabled. Or perhaps not, I haven't had
enough tests I'm certain of to be absolutely sure.

I'd like to test it further, but in the interests of getting a fix to the
customer quickly I intend to commit what I have now (after review). If it looks
like the embedding is not quite as expected I'll open a new enhancement tracker
for it.

Comment 18 Ken Sharp 2007-12-17 03:26:06 UTC
Added a new tracker (#689616) to track the proposed feature of embedding fonts
which are found to be symbolic. The patch (following shortly) does not address
the issue of embedding symbolic fonts. It turns out that this happens only for
the Symbol font when the proposed patch is implemented.
Comment 19 Ken Sharp 2008-01-02 05:19:56 UTC
This patch http://ghostscript.com/pipermail/gs-cvs/2007-December/008032.html
fixes the naming issue, and correctly assigns the non-symbolic flag to the font.

This patch http://ghostscript.com/pipermail/gs-cvs/2008-January/008057.html
completes this issue, simply removing some (now) redundant code and arguments.