Created attachment 9110 [details] result of ps2pdf This is a follow up from bugs reported here: 1. lilypond: http://code.google.com/p/lilypond/issues/detail?id=2985 2. evince: https://mail.gnome.org/archives/evince-list/2012-November/msg00018.html After some discussions, it seems that the issue we found was that ps2pdf used by lilypond is wrong when encoding the pdf metadata. Here is a very short example to reproduce this: Let sss.ps be an ascii file containing: showpage [ /Title (Document title) /Author (\241 \242) /DOCINFO pdfmark Notice that the author field contains non ASCII characters, 0xA1 and 0xA2. If you transform this sss.ps file to sss.pdf with the following command (equivalent to ps2pdf): $ ./gs-906-linux_x86_64 -P- -dSAFER -dCompatibilityLevel=1.4 -q -P- -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sstdout=%stderr -sOutputFile=sss.pdf -P- -dSAFER -dCompatibilityLevel=1.4 -c .setpdfwrite -f sss.ps You get an error when the file is opened in evince (version 3.4.0 with poppler/cairo 0.18.4 and libxml 2.7.8 on fedora 17): Entity: line 10: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xA1 0x20 0xA2 0x3C fault'>Document title</rdf:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li> The unicode of the two characters are 0xA1 and 0xA2 as written in the pdf document but the UTF-8 representation of these is 0xC2 0xA1 and 0xC2 0xA2 so that two bytes are missing (0xC2).
(In reply to comment #0) > Created an attachment (id=9110) [details] > result of ps2pdf I'd much prefer that you post the file before conversion, I'm quite capable of running Ghostscript to see what the output looks like.
Created attachment 9111 [details] input ps file
Technically the 'correct' approach is to define a PDFDSCEncoding which maps the non-ASCII values. However, this is non-trivial, and counter-intuitive. I've made changes so that in the absence of a PDFDSCEncoding we will assume that any non UTF-16BE string is using PDFDocEncoding. We then convert that to UTF-16BE and on to UTF-8. This should resolve the problem. See commit: a3d00daf5f9abb1209cb750a95e23bc6951c1c63
Thanks for quick fix.
I have built the modified gs. I do not have the error in evince anymore. I still have a question: why 0xA1 and 0xA2 in .ps are encoded 0xC2 0xA3 and 0xC2 0xA4 in the xml part of the.pdf and not 0xC2 0xA1 and 0xC2 0xA2? For a reason I do not understand pdfinfo interprets it the same but can you explain?
(In reply to comment #5) > I have built the modified gs. I do not have the error in evince anymore. I > still have a question: why 0xA1 and 0xA2 in .ps are encoded 0xC2 0xA3 and 0xC2 > 0xA4 in the xml part of the.pdf and not 0xC2 0xA1 and 0xC2 0xA2? For a reason I > do not understand pdfinfo interprets it the same but can you explain? Hmm, I'd have to check, that would suggest that I messed up the lookup table which converts PDFDocEncoding into XML. I'll look at it again.
Yes, you were quite correct, I'd missed an entry in the lookup table quite near the beginning. There's a fix here: 3a4439baee68c440da7164daf55de04a4d48609a I believe that fixes it but its unfortunately easy to miss entries when cresting these kinds of tables.
that works now, thanks.