This may be a bad font problem, or something in gsview or ghostscript. A simple text PDF using the "Edwardian Script ITC" font, generated by Ghostscript 8.63 displays properly in gsview32.exe (version 4.9 2007-11-18) and also in Adobe Reader 8.1.4. The displayed text reads "First sample sentence. Second attempt." Bug: using the "text extract" function on this PDF in gsview, we obtain the following text: "Y|Üáà átÅÑÄx áxÇàxÇvxA fxvÉÇw tààxÅÑàA" I can't really blame gsview, because I get exactly the same string using text-copy in Adobe Reader, and I also get that garbled text displayed when I import the PDF into Inkscape 0.46+devel r21167, built Apr 17 2009. You can download the specific PDF I'm talking about here: http://launchpadlibrarian.net/25800877/test3.pdf This may not be a bug in ghostscript/gsview, but I'd love to know why this happens and if there is a workaround.
The font in question is a TrueType font embedded as a subset without a ToUnicode CMap, and using a custom encoding. For example /Y (capital Y) is encoded at position 1. In addition the glyph names in the encoding are not what one would expect, I would expect to see /F, /i /r, /s, /t and so on. Instead I see /Y /bar /Udieresis /aacute etc. So there is no Unicode information, and the encoding is non standard. In this case Acrobat falls back to translating the glyph names into their ASCII equivalents (when possible). Using the Encoding to map from the character codes to the glyph names we see that we get /Y /bar /Udieresis /aacute /agrave /space /aacute /t and so on, which matches what you get when you copy and paste. Its impossible to tell from the PDF file why the file was created this way, one would have to guess that the file was created from a PostScript file which had re-encoded the font like this, so that the PDF file had to be made the same way. I don't see a bug here, possibly (given that the PDF file was created by GS 8.63) there is a bug in pdfwrite which caused the encoding oddness, btu that can't be determined without seeing the PostScript file.
Created attachment 4961 [details] Postscript file which exhibits bug when converted to PDF (in ZIP file) Attached PS file (in ZIP) displays bug after conversion to PDF. File generated by MS Office Word 2003 printing to MS Publisher Imagesetter (with printer>advanced PS option "optimize for portability")
Have confirmed behavior is due to inadquate PS file generation. Same document with same font, generated in Open Office 3 using "Export to PDF" works 100% ok.