The attached PS file has embedded subset TTF's. The customer wants to be able to create PDF's without the fonts embedded so that the PDF file size will be smaller. When Ghostscript converts this to PDF normally (-dEmbedAllFonts=true), the OrigFontName gets used as the BaseFont when subset font is put into the PDF (as expected). When the distillerparam -dEmbedAllFonts=false, the PDF is much smaller (as desired) but the font names things like "F0" that Adobe Acrobat Reader can't substitute for, so just 'dots' are displayed. In the PDF's with EmbedAllFonts=true we see: %Resolving: [8 0] << /BaseFont /RABYKY+TimesNewRoman /FontDescriptor 9 0 R /Type /Font /FirstChar 1 /LastChar 51 /Widths [ 722 500 500 444 500 278 556 722 667 611 556 611 333 250 556 500 500 250 500 500 722 611 667 722 889 667 722 250 500 500 500 500 722 333 722 722 500 444 500 833 722 500 444 333 389 500 278 500 278 444 500 ] /Encoding 23 0 R /Subtype /TrueType >> endobj %Resolving: [9 0] << /Type /FontDescriptor /FontName /RABYKY+TimesNewRoman /FontBBox [-12 -195 868 694 ] /Flags 4 /Ascent 694 /CapHeight 694 /Descent -195 /ItalicAngle 0 /StemV 130 /MissingWidth 777 /FontFile2 19 0 R >> ------------------------------------------------------------------- In the above case, the name is being picked up from the OrigFontName key in the FontInfo directory created by the PS: AddFontInfoBegin AddFontInfo /OrigFontType /TrueType def /OrigFontName <54696D6573204E657720526F6D616E> def /OrigFontStyle () def /FSType 0 def AddFontInfoEnd ------------------------------------------------------------------------- With EmbedAllFonts=false we see: %Resolving: [8 0] << /BaseFont /F0 /FontDescriptor 9 0 R /Type /Font /FirstChar 1 /LastChar 51 /Widths [ 722 500 500 444 500 278 556 722 667 611 556 611 333 250 556 500 500 250 500 500 722 611 667 722 889 667 722 250 500 500 500 500 722 333 722 722 500 444 500 833 722 500 444 333 389 500 278 500 278 444 500 ] /Encoding 19 0 R /Subtype /TrueType >> endobj %Resolving: [9 0] << /Type /FontDescriptor /FontName /F0 /FontBBox [-12 -195 868 694 ] /Flags 4 /Ascent 694 /CapHeight 694 /Descent -195 /ItalicAngle 0 /StemV 130 /MissingWidth 777 >> endobj ------------------------------------------------------------- What is needed is to use the OrigFontName as the /BaseFont value similarly to the way it is used when embedding the subset.
Created attachment 3614 [details] test.ps The two different cases can be generated using this file using: gswin32c -sDEVICE=pdfwrite -o fontsembedded.pdf -dEmbedAllFonts=true test.ps and gswin32c -sDEVICE=pdfwrite -o fontsNOTembedded.pdf -dEmbedAllFonts=false test.ps Then just look at the two PDF's with Acrobat Reader.
Fixing bug priority to customer level. Sorry for the confusing, Ken. Note that when in doubt, check Joann's customer list info. This customer is under FULL support.
Not a problem Ray, I did check the list, which is why I queried the priority, didn't seem high enough to me ;-) Since I'm awaiting input from yourself and Ralph on the JBIG2 issue, I've started in on this one anyway.
Hmmm. I'm afraid there's a great deal more to this than the font name. The font which is downloaded in the job has a custom, non-standard, encoding. Simply using the original font name will cause Acrobat to substitute a font available on the host PC. In my case TimesNewRomanPSMT and friends. However, the display is garbage. This seems to be because we are using a custom encoding, because the original job used a custom encoding for the downloaded font. So we end up using glyphs starting from encoding position 0 of the /Encoding array. We *do* embed an Encoding in the FontDescriptor, and it contains a /Differences array. It looks like Acrobat is ignoring that, and simply displaying whatever would be present in a WinAnsiEncoding. Or, possibly, has no idea what to do with an Encoding array applied to a native TrueType font from disk, since TrueType fonts don't have glyph names... For example, here's the first line of text from the GS PDF file: [()2.14672()-1.83244()9.07697()-3.43849()-1.83244()277.833]TJ (The odd characters are 0x01, 0x02, 0x03, 0x04, 0x05 and 0x06 in ASCII) From Acrobat: [(Adv)9.9(e)-2.6(nt)]TJ And the original PostScript: 2415 682 M <010203040506>[66 46 45 41 46 0]xS Notice that the indices 01, 02, 03, 04, 05 and 06 have magically transformed in the Acrobat output to the ASCII values A, d, v, e, n and t. Its not obvious to me how Acrobat has managed that. My guess is that its something to do with the glyph name supplied to AddT42Char, eg: /TTE35D7008t00 AddT42Char Since this is from an Adobe PostScript driver, I guess Adobe have some means of working back from that hex value to a glyph name. They then presumably decide whether the total glyphs in the font can be represented in a standard encoding, and if so use it. I note that the Distiller produced PDF file uses /WinAnsiEncoding. So I'm not sure what to do about this. I could modify pdfwrite to use the original font name fairly easily. Probably I should do that, as at least missing fonts with sensible encodings would work as expected. However, it will simply replace the customer's problem with a different one. One which is arguably not a bug (identifying the glyphs and remapping to a standard encoding isn't anywhere in the spec) but a feature, and which will be quite hard to implement, I suspect, especially since there is no documentation on what the argument to AddT42Char means. CC'ed Igor on this, as I'd like his (more experienced) opinion.
Oops, didn't mean to close the bug, sorry folks.
OK, I did some more looking and it seems that the glyph names *are* present and correct, in the fonts Encoding. So 'all' we would need to do is check to see that all the named glyphs are present in a standard Encoding, emit the font with a standard Encoding, and remap all the text strings to use the standard Encoding rather than the custom one. I'm not sure this is possible, because we emit the strings before accumulating all the glyphs. So we would need to modify and emit the strings using a standard encoding before we know whether use of a standard encoding is possible. I guess we could remap all the glyphs we can, and map the glyphs that don't fit into unused portions of the encoding as we go. If we run out of unassigned places we could then use up standard positions. This would mean that 'some' text would work, but I'm not sure this is worth the effort. Is this a bug fix or a feature request ?
Since we don't do as well as Adobe, it is probably a bug. There is a workaround of embedding the fonts.
Umm, I'd be inclined to call it a feature myself. The PDF file produced has the same information as the original PostScript, the glyphs are encoded the same way. We are doing what's been requested, the fact that the font has been re-encoded in a way that makes it invalid for a simple substitution isn't our fault ;-) In fact, Acrobat seems to remap the font and text to WinAnsi Indeed, Ghostscript can even substitute the missing fonts and print the result 'correctly'. It seems that the fact that the font is flagged as symbolic is the reason Acrobat ignores the encoding. But if I remove that, I get a different error, I believe because we are using unusual numbers in the Encoding array. I do agree that not including the original font name is a bug, that makes substitution impossible even for a font which has not been re-encoded. I suggest that I address that issue here, and open a new tracking number with a feature request to remap fonts to a standard Encoding. I think this should be controlled with a switch. Opinions anyone ?
A long ago we desided to do no reencoding due to multiple problems with 3d party fonts. One example is when standard glyph names are used with instandard glyphs. Note in this example a right rendering isn't possible with no embedding the font, or when the embedded font is substituted by the viewer. Note Adobe does not guarantee a right result when using 3d party fonts or documents. I think the right way for the test case is to make Adobe to use the encoding we generate. The problem with Symbolic flag is terrific. A long ago we wanted to change to setting a right symbolic flag, but we had no time for deep investigation. I suggest Ken to spend some time to investigate how to set the right symbolic flag, and how to process it correctly in our PDF interpreter. I very suspect that our PDF interpreter is bug-to-bug compatible to our writer about the Symbolic flag. Another suspicious thing is what cmap do we include into the embedded font. Ken, please check the following : if replace char codes with glyph names by the Encoding generated by Ghostscript, and then replace glyph names with glyphs by the PDF specification, will we get the right text ?
Leo, thanks for your comments. I could see that the absence of re-encoding was deliberate, but it is something which Distiller does. I guess we will be criticised if we don't also do this. However it is quite risky as you rightly say! One good reason to make it a switch, if we implement it. I am also coming to a fairly firm belief that this (re-encoding) should be treated as an enhancement request. It does seem that not using the original font name should be a bug though, do you think ? The symbolic flag; according to the spec, if this flag is set the font contains glyphs outside the Adobe Latin 1 character set. Since the font contains glyphs starting at Encoding position 1, this is clearly the case, and it is quite correct that this flag is set by pdfwrite, in this instance at least. The only way we could realistically *not* set the flag would be to re-encode the font. Clearing the flag (in a binary editor) causes Acrobat to complain that the font has invalid flags, which is true. It appears that Ghostscript doesn't care what the sybmolic setting is and uses the Encoding anyway. Acrobat won't use the Encoding if the font is flagged Symbolic, and will complain if the font is not flagged symbolic, but uses non-latin 1 character encodings. There doesn't seem to be any way, therefore, to make Acrobat use the Encoding without re-encoding the font. If we are going to re-encode the font then we may as well do it the same as Distiller does. The font doesn't include a cmap, because its a type 42.(ie /SubType /TrueType). NB Ghostscript does complain about the PDF file if the fonts are not embedded, pointing out that fonts of type /TrueType should be embedded. I'm not sure what the last thing you are asking me to check is. I do intend to meddle with the glyph names in the PostScript, and see what Distiller does if one of them is not a standard glyph name. [later] Hmm, interesting. That generates two fonts called TimesNewRoman. One a TrueType with a WinAnsi encoding, the other a type 2 CID (CIDFont with type 42 outlines) with a (2-byte) Identity-H encoding... Not only that, but Distiller embeds the two fonts, even though I've set EmbedAllFonts to false, and specifically excluded the fonts in question from embedding by adding to the never embed list. To me it looks like Distiller re-encodes the fonts to Ansi whenever it can, and if it can't (for TrueType fonts, anyway) it embeds the font regardless of the settings of the embed flags. If the decision has already been taken not to do re-encoding of fonts, then I think there is no way to achieve what the customer wants, which is a file which contains no fonts, but which will display correctly in Acrobat, starting with a PostScript file which contains re-encoded subset TrueType fonts. Again, opinions sought. Specifically should we do the re-encoding work, and is it (my contention) an enhancement not a bug fix.
Ken, > I guess we will be criticised if we don't also do this. There were multiple attempts to creticise that in last 4 years, but all them were dismissed with fixing minor bugs. I think this bug is such. If I am the owner, I would consider this way as the very last after all other atttempts fail. > Since the font contains glyphs starting at Encoding position 1, this is clearly the case Does the encoding position 1 define a glyph name from outside Latin 1 glyph names ? I believe that char codes are not important, only glyphs have sence for Symbolic. > Clearing the flag (in a binary editor) causes Acrobat to complain There are 2 flags : Symbolic and Non Symbolic. If both are unset, Adobe claims invalid font. When you reset one, you must set the other. > Acrobat won't use the Encoding if the font is flagged Symbolic, Ahh it may depend on wheather the font is embedded or not, on available cmap subtables. Ghostscript maps through Encoding, then through 'post' then through cmap. Adobe (I hguess here) may skip post and use Adobe Glyph List instead, or something else. They document that they do use hewristics, but never said how the hewristics work. To go out this bog first of all I would check whether Non Symbolic flag works for this font. > The font doesn't include a cmap, because its a type 42 I meant the 'cmap' subtable of True Type format. It must be included when a True Type font is embedded into PDF. > pointing out that fonts of type /TrueType should be embedded. Right I implemented this warning following the PDF spec. Here it does inform us that we're in the undocumented bog. > Not only that, but Distiller embeds the two fonts Another Undocumented is how NeverEmbed must control CID fonts. You said the second font is CID, so... > To me it looks like Distiller re-encodes the fonts to Ansi whenever it can Well I recommend to take a Type 1 font with Standard Encoding, exchange 2 glyph names in CharStrings and in Encoding, then distill with NeverEmbed. I think Adobe will give a wrong rendering. I'm saying this because the predicate "it can reencode" isn't algorithmically soluble". So we're again the the undocumented hewristic bog. > then I think there is no way to achieve what the customer wants Try Non Symbolic flag.
s/creticise/criticize
BTW, if Adobe ignores Encoding with Symbolic, then symbolic font must be embedded if it contains an instandard glyph (with no regard what does 'instandard' mean exactly). Then either the customer's font is NonSymbolic (all glyphs are standard) or the customer's case is an incorrect usage of a PDF converter.
> Does the encoding position 1 define a glyph name from outside Latin 1 glyph > names ? I believe that char codes are not important, only glyphs have sence for > Symbolic. Re-reading the spec, I think you are correct, it only mentions the glyph names, not their encoded positions. In this case the font only contains glyphs which are in the latin 1 character set (but their Encoding is different). So I guess the font should be Nonsymbolic. > There are 2 flags : Symbolic and Non Symbolic. If both are unset, Adobe claims > invalid font. When you reset one, you must set the other. Yes, I tried it both ways and got the same result; "The font /F0 contains bad /Flags" I've no idea why I'm afraid. AH! I just checked the Distiller output, and I made a mistake. Bit 5 is unused, the Nonsymbolic bit is actually bit 6. If I set the /Flags to 32 instead of 16, the PDF output from GS appears 'correct' in Acrobat. At the moment its substituting the wrong fonts, because I haven't implemented the fix for font names, with the result that its using Adobe Sands MM to substitute. Since I haven't set the italic flag the italics have all gone, but all the glyphs are correct, including the modified encoding. So, good news! OK, I think I see the way forward (thanks to Leo here) as: Fix the fontname, for which I have a trivially simple patch, and then work on getting the Symbolic flag correct when writing out the Font Descriptor. This should address the specific customer issue. As a possible additional feature, I could opt to ignore the setting of the embed flag, if the font does turn out to be 'Symbolic', and go ahead and embed the font anyway. Its not exactly true to the description of the flag, but it is pretty much what Distiller does, so I think its reasonable for us to do so too. Any dissenting opinions ? Should I *not* attempt to embed the font under these conditions ?
I read Comment #14 and I think I can't add anything useful now.
I think that always embedding the font (Subset) when it includes a glyph outside of the Latin 1 set (implying that we need to flag the font as Symbolic) is a reasonable fallback. The rest sounds like good news indeed.
Embedding the font seems to happen already (somewhat to my surprise), it seems to be one of the features which was disabled. Or perhaps not, I haven't had enough tests I'm certain of to be absolutely sure. I'd like to test it further, but in the interests of getting a fix to the customer quickly I intend to commit what I have now (after review). If it looks like the embedding is not quite as expected I'll open a new enhancement tracker for it.
Added a new tracker (#689616) to track the proposed feature of embedding fonts which are found to be symbolic. The patch (following shortly) does not address the issue of embedding symbolic fonts. It turns out that this happens only for the Symbol font when the proposed patch is implemented.
This patch http://ghostscript.com/pipermail/gs-cvs/2007-December/008032.html fixes the naming issue, and correctly assigns the non-symbolic flag to the font. This patch http://ghostscript.com/pipermail/gs-cvs/2008-January/008057.html completes this issue, simply removing some (now) redundant code and arguments.