Created attachment 23944 [details] input.pdf file Hello again, I have a very simple .tex file, which I compile this with `lualatex input.tex` ``` \documentclass{article} \begin{document} ff, fi, fl, ffi, ffl \end{document} ``` If I open this file with evince or Adobe Reader and copy/paste, I get "ff, fi, fl, ffi, ffl" as expected. --- Now, if I run the outputted file through gs 10.00.0: gs -sDEVICE=pdfwrite -o out-new.pdf -f input.pdf And copy/paste, I get "昀昀, 昀椀, 昀氀, 昀케, 昀툀". --- However, if I use: gs -sDEVICE=pdfwrite -dNEWPDF=false -o out-old.pdf -f input.pdf And copy/paste, I get "ff, fi, fl, ffi, ffl" again. --- I am not sure if this is a problem with gs or with luaLaTeX (my thought is it's the latter). However since there is a difference between the old and new PDF interpreter, I thought it warranted a bug report. I haven't been able to test 10.01.1 yet, so I apologize in case this has been fixed already.
Created attachment 23945 [details] output file with new interpreter
Created attachment 23946 [details] output file with old interpreter
This commit: 34055411d34255d811dd091e7f771b92d4494600 fixes the problem with double characters. The problem with Unicode code point mappings exceeding 4 bytes already has a bug report: https://bugs.ghostscript.com/show_bug.cgi?id=704674 The result is somewhat different because that is a Font file rather than a CIDFont, so the ToUnicode CMap gets dropped entirely instead of this case, which causes incorrect values. But fundamentally the problem remains the same, the current code can't cope with ToUnicode CMaps which contain more than 4 bytes worth of Unicode Code point. We'll deal with that as one project so I'm just going to add the remaining part of this bug to that report. *** This bug has been marked as a duplicate of bug 704674 ***
(In reply to Ken Sharp from comment #3) > This commit: > > 34055411d34255d811dd091e7f771b92d4494600 > > fixes the problem with double characters. Awesome, I've tested this commit and confirmed the double chars are fixed. > The problem with Unicode code > point mappings exceeding 4 bytes already has a bug report: > > https://bugs.ghostscript.com/show_bug.cgi?id=704674 Got it and thanks for adding me to the CC list there. Since it's a larger refactor I understand it may take time. Fortunately, I don't have any actual PDFs that have this problem, I was just monkeying around with LaTeX a bit.