Summary: | Text synthesis of missing appearances in the PDF interpreter does not handle UTF16BE | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | andrusha |
Component: | PDF Interpreter | Assignee: | Ken Sharp <ken.sharp> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | ||
Priority: | P4 | ||
Version: | 9.07 | ||
Hardware: | PC | ||
OS: | All | ||
Customer: | Word Size: | --- | |
Attachments: | Example bogus file. |
Description
andrusha
2013-10-24 02:15:19 UTC
The actual problem is that the text synthesis code (Tform) does not handle strings in UTF16BE. This commit : 1cb2458772321dc86117cb45b5b28a1423ccf9b7 fixes the problem for me but I'm a little concerned that it is simply masking a deeper problem. If you still get the same result with other files please reopen the bug and attach a new failing file. Oops, sorry, wrong bug :-( commit 33fb85045c2590ac58a723ea2abcfbde505e53d1 resolves this bug. We now strip the BOM before printing the text. The result is not the same as the Acrobat display, but this is because Acrobat ignores the appearances form the form and creates its own version. The text is not the same as Acrobat because we do use the DA (Default Appearance) and the font in use is not appropriately encoded for use with UTF16 encoded text. If we instead use our own fallback font, and a Identity-UTF16-H CMap, the text matches the Acrobat display, which to me is pretty conclusive evidence that the font is incorrect. |