Summary: | PDF using Edwardian Script ITC font displays garbled text using "text extract" | ||
---|---|---|---|
Product: | Artifex GSview | Reporter: | John Beale <beale> |
Component: | General | Assignee: | Russell Lang <gsview> |
Status: | NOTIFIED INVALID | ||
Severity: | normal | ||
Priority: | P4 | ||
Version: | unspecified | ||
Hardware: | PC | ||
OS: | Windows XP | ||
Customer: | Word Size: | --- | |
Attachments: | Postscript file which exhibits bug when converted to PDF (in ZIP file) |
Description
John Beale
2009-04-21 10:22:31 UTC
The font in question is a TrueType font embedded as a subset without a ToUnicode CMap, and using a custom encoding. For example /Y (capital Y) is encoded at position 1. In addition the glyph names in the encoding are not what one would expect, I would expect to see /F, /i /r, /s, /t and so on. Instead I see /Y /bar /Udieresis /aacute etc. So there is no Unicode information, and the encoding is non standard. In this case Acrobat falls back to translating the glyph names into their ASCII equivalents (when possible). Using the Encoding to map from the character codes to the glyph names we see that we get /Y /bar /Udieresis /aacute /agrave /space /aacute /t and so on, which matches what you get when you copy and paste. Its impossible to tell from the PDF file why the file was created this way, one would have to guess that the file was created from a PostScript file which had re-encoded the font like this, so that the PDF file had to be made the same way. I don't see a bug here, possibly (given that the PDF file was created by GS 8.63) there is a bug in pdfwrite which caused the encoding oddness, btu that can't be determined without seeing the PostScript file. Created attachment 4961 [details]
Postscript file which exhibits bug when converted to PDF (in ZIP file)
Attached PS file (in ZIP) displays bug after conversion to PDF. File generated
by MS Office Word 2003 printing to MS Publisher Imagesetter (with
printer>advanced PS option "optimize for portability")
Have confirmed behavior is due to inadquate PS file generation. Same document with same font, generated in Open Office 3 using "Export to PDF" works 100% ok. |