The ligatures ff, fi, fl, ffi, ffl and st (Unicode 0xFB00 to 0xFB06) will have to be converted into their individual characters anyway for searching (no user will enter a proper ligature) and copying text (many fonts don't completely support them). Getting them already split up will make things significantly easier, as they can't be substituted by a single character. Potential patch: http://code.google.com/p/sumatrapdf/source/diff? spec=svn1349&r=1349&format=side&path=/trunk/mupdf/mupdf/pdf_unicode.c