The problem concerns particularly Cyrillic capital letter “Short I”, U+0419 in Arial, Courier New, and Times New Roman; small letter “Be”, U+0431 in DejaVu Sans. Postscript documents containing such letters in fonts mentioned loss them after postscript-to-pdf conversion. This particularly leads to corrupted printout of openoffice.org documents printed via cups print manager. The sequence looks as follows: Openoffice.org sends postscript document to cups where it applies postscript-to-pdf conversion using ghostscript utilities where the letters in subject are losing. There are postscript file in the attachment containing only “Shot I” letter in Arial, and pdf file made through ghostscript with this letter lost. Input postscript file was produced by openoffice.org. Conversion was made via: gs -sDEVICE=pdfwrite -sOutputFile=ShortI.pdf ShortI.ps Environment: Ghostscript: 8.64.dfsg.1-0ubuntu8 Openoffice.org: OOO310m19. Build 9420 Fonts: installed by distribution, DejaVu Sans of the latest version was also tried Cups: 1.3.9-17ubuntu3.4. People had tried a couple of cups versions. Generally, the problems starts with cups version that started using ps2pdf conversion provided by ghostscript OS: Kubuntu 9.04 Here are some bug reports on the issue: https://bugs.launchpad.net/bugs/449255 http://www.openoffice.org/issues/show_bug.cgi?id=106833 There are some of the very same issues with other characters and fonts that may be caused by this problem: http://www.openoffice.org/issues/show_bug.cgi?id=104050 https://bugs.launchpad.net/openoffice/+bug/376953 http://www.openoffice.org/issues/show_bug.cgi?id=105631 P.S. How could I send the attachment?
Created attachment 5692 [details] There is input postscript file
Created attachment 5693 [details] The output pdf file
I am unclear on how you are testing this. The file you have supplied 'The output file.pdf' displays a single glyph on Acrobat Reader 4 and 5, and Acrobat Professional 7 and 9. Converting the file 'There is input postscript file.ps' to PDF using Acrobat Distiller 9 produces a PDF file with a single identical glyph. In short, I cannot see a problem with the PDF file produced by pdfwrite.
Thank you for reply. You're right, Adobe Reader shows glyph correctly. I used another pdf renderer that didn't show it. So, postscript file from openoffice.org seems to be valid. Unfortunately this doesn't solve the problem, the “Short I” doesn't print correctly. May be the bug is in further processing that finishes by pdf-to-postscript conversion, and the result is then sent to printer. I tried to model this process omitting intermediate transformations, leaving only postscript-pdf- by: gs -sDEVICE=pswrite -sOutputFile=ShortI.pdf-gs.ps ShortI.pdf There is no glyph for “Short I” was shown in my renderer for ShortI.pdf-gs.ps. To verify it against Adobe Reader I converted current result, ShortI.pdf- gs -sDEVICE=pdfwrite -sOutputFile=ShortI.pdf-gs.pdf ShortI.pdf-gs.ps Adobe Reader didn't showed “Short I” in resulting pdf, ShortI.pdf-gs.pdf, too. What is special with the pdf initially created by ghostscript from input postscript file so it cannot be converted back into postscript?
Created attachment 5695 [details] Output after postscript-pdf-postscript
Somehow it messed my prev. message. In short, conversion chain "postscript then pdf then postscript" done using ghostscript doesn't seems to produce ShortI in final postscript. It isn't shown in postscript rendered by my system rendered, nor by gs command line renderer (on screen) nor by Adobe Reader from pdf cooked from this postscript by ghostscript
Since the problem occurs with rendering, and the PDF displays correctly in Acrobat, this is probably not a PDF writer problem. It could be a font issue or it could be a problem with the PDF interpreter. Assigning to Alex initially to see if he can tell which component is the problem.
PDF files generated by both Ghostscript and Distiller 5 render correctly with -dRENDERTTNOTDEF flag.
What's happening here is that the font is indeed being defined in such a way that the glyph being used is a .notdef glyph: /Encoding 256 array def 0 1 255 {Encoding exch /.notdef put} for Encoding 0 /glyph3 put ... ... /CharStrings 2 dict dup begin /.notdef 0 def /glyph1 1 def /glyph2 2 def /glyph3 3 def ... ... (ArialMTFID33HGSet2) cvn findfont 100 -100 matrix scale makefont setfont <01> show The show operation uses the character code 0x01, the Encoding is set up with position 0 being /glyph3 and all other positions being /.notdef. So as a result we render the /.notdef glyph. The default behaviour for Ghostscript is to render TrueType /.notdef glyphs when the input is PostScript, and *not* to render TrueType /.notdef glyphs when the input is PDF. Hence why this works when you run the original PostScript, but doesn't work when you run the PDF file. We know from the original work on this issue (see bug #689757) that the rules Acrobat uses on whether to render a /.notdef or not are incomprehensible. In particular we know that making a font symbolic does not force display. I mention this because I had thought that the fact that the font was symbolic was why Acrobat displayed this one. I believe the simple answer is that if you want to render PDF where real glyphs are encoded as the /.notdef glyph you will have to set -dRENDERTTNOTDEF as Alex noted above. This will *also* render notdef glyphs in files where the /.notdef glyph is defined as a hollow rectangle, which will give rise to the 'hollow boxes' complaint, which is why this switch defaults to disabled. In case anyone wants to push back upstream with this, the file was created by OpenOffice 3.1, and the subset font was created using "SunTypeTools-TT 1.0 gelf", accordinf to the comments, the original font was : %%Creator: SunTypeTools-TT 1.0 gelf %- Font subset generated from a source font file: '/usr/share/fonts/truetype/msttcorefonts/Arial.ttf' %- Original font name: ArialMT %- Original font family: Arial %- Original font sub-family: Regular Obviously I don't know what the original font looked like, and I haven't decoded the sfnts array, but I very much doubt if it had a glyph called /.notdef which was a real Cyrillic character.
Thank you for such detailed reply. I tried to model CUPS pipeline again, using gs -dRENDERTTNOTDEF -sDEVICE=pswrite for final conversion. The result rendered Short I with either renderer I have, even with that one (Okular) doesn't render the letter in the input PDF file. As I see, there is no bug with this in ghostscript, the behavior can be chosen: you either may have Short I but you may get hollow rectangles or you have no Short I as well as no chances to get rectangles. Only specific thing I noted about original font, I mean Arial.ttf, is that the Short I, Unicode 0419, is represented by combination of two other glyphs, the base Unicode 0418, and the breve Unicode 0306. They're specified to be components of the Short I. The font has “.notdef” glyph, it looks like the famous hollow rectangle, font program complains on it: “Glyph 1295 is called ".notdef", a singularly inept choice of name (only glyph 0 may be called .notdef)” The other problematic letter, DejaVu Sans, Be, Unicode 0431, has no components defined, so it isn't the rule. However fonts' and postscript details are the matters where I'm not too strong. Would I be wrong if I conclude from your description that Short I is represented in the original postscript in a non-standard way, and thus it may be rendered with some settings/renderers but may not with others? In other words, is this way the reason of that the people are getting Short I and other such encoded letters dropped from printouts in some configurations of CUPS pipeline and aren't getting in others (in older systems)? And, is the technical postscript-related reason may exist forcing representing letters in such a way? Thank you again for detailed description.
>Only specific thing I noted about original font, I mean Arial.ttf, is that the >Short I, Unicode 0419, is represented by combination of two other glyphs, the >base Unicode 0418, and the breve Unicode 0306. They're specified to be >components of the Short I. I assumed the original font was fine, the font being used in this case is a subset font containing only the glyphs needed for the document. It appears to have been created by the SunTypeTools-TT program. >The font has “.notdef” glyph, it looks like the >famous hollow rectangle, font program complains on it: “Glyph 1295 is >called ".notdef", a singularly inept choice of name (only glyph 0 may be called >.notdef)” Yes, this is the problem. PostScript uses glyph names, TrueType uses numeric IDs (GID). In both cases the font technology defines a glyph to be used when the requested glyph is not present. PostScript calls this glyph '/.notdef', TrueType defines it as GID 0. Of course we need a way to map PostScript glyph names to TrueType GIDs, and what happens here is that the PostScript glyph named /.notdef is not assigned to GID 0. In fact the glyph named /.notdef is not a fallback glyph at all, its a real glyph, and that's where the problem arises. >Would I be wrong if I conclude from your description that Short I is >represented in the original postscript in a non-standard way, and thus it may >be rendered with some settings/renderers but may not with others? Its not so much non-standard as completely mad, see my comments above :-) PostScript being a flexible programming language its technically possible and theoretically legal, but its not sensible. In PostScript we always render the /.notdef glyph, because that's the way the specification is written and mostly everyone sticks to the spec. In PDF, however, although the spec is written so that the /.notdef glyph should be rendered, Adobe Acrobat 'sometimes' (and I haven't been able to work out a rule for this) doesn't render the /.notdef but instead leaves a gap equivalent to its width. This leads to complaints about 'hollow squares' or 'boxes'. Of course these are technically correctly rendered, but Acrobat doesn't display them so we are seen as incorrect. This is what the RENDERTTNOTDEF flag is for, it defaults to 'don't render' because that gets better equivalence with Acrobat, but in this case it means a real glyph doesn't get drawn, because it has been given the name /.notdef. >is this way the reason of that the people are getting Short I and other >such encoded letters dropped from printouts in some configurations of CUPS >pipeline and aren't getting in others (in older systems)? The RENDERTTNOTDEF flag is relatively new, anyone running an older version of Ghostscript won't have the flag, and in this case the behaviour is the same for PostScript as PDF, the /.notdef glyph *is* rendered. So older systems will work 'correctly'. >And, is the technical >postscript-related reason may exist forcing representing letters in such a way? There is no good reason for the glyph to be named /.notdef and this is the source of all the problems. In fact there are good reasons *not* to name a glyph as /.notdef, its confusing at the very least, and if we tried to use a glyph which was missing in the original font we would get the 'Short I' glyph instead of the usual hollow square. That of course is very difficult to spot when proofing, which is the point of TT fonts using a more or less instantly recognisable 'error' glyph. (PostScript fonts often use a 'space' for /.notdef, that is, no marks are made)
Hello everyone! I tried to use the option -dRENDERTTNOTDEF while converting ps-file obtained from OpenOffice 3.1 to pdf-file (using ps2pdf) and my Okular did not show the Russian letter Short I. I use Ubuntu 9.10 x86_64, the version of ghostscript is 8.70. May be, such behavior is due to patching of ghostscript by Debian team? I tried to compile from original source but without success. The problem is solved when I use ps2ps utility and then ps2pdf (without flag -dRENDERTTNOTDEF).
To: Ken Sharp Thank you for comprehensive explanation