Summary: | 10.06.0rc2: PDFwrite breaks encoding of ligature with old Adobe Glyph List name | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Pavel Hanak <hanakp> |
Component: | PDF Writer | Assignee: | Default assignee <ghostpdl-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P2 | ||
Version: | master | ||
Hardware: | PC | ||
OS: | Windows 10 | ||
Customer: | Word Size: | --- | |
Attachments: | Broken ligature samples |
Description
Pavel Hanak
2025-09-03 11:44:24 UTC
(In reply to Pavel Hanak from comment #0) > I'm attaching sample PDF shortened to 1 line. It has valid text encoding, > i.e. it's possible to copy+paste text that corresponds to glyphs, even in In > Adobe Reader and Adobe Acrobat XI. The PDF was originally created in ancient > software and its font encoding is rather complicated, with lots of > Differences and partial use of ToUnicode table. When I process it with > PDFwrite, encoding of all characters is preserved, except one: fi ligature. > It copies as 0x03 in the output file, likely because its glyph is in the 3rd > place in /CharStrings. Note this happens even when I merely "re-generate" > the source file, without OCR or any other processing. Exact command is: The original PDF file has a ToUnicode CMap which contains: 1 beginbfrange <03> <03> [<00660069>] endbfrange So that maps a single glyph to 2 Unicode code points. The pdfwrite device can't handle that. *** This bug has been marked as a duplicate of bug 704674 *** OMG it's at the very bottom of the ToUnicode table and I haven't noticed it. Sorry about that. |