Summary: | Support direct use of CID-Keyed fonts from PostScript with txtwrite | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Holger <hsberlin> |
Component: | Other Driver | Assignee: | Ken Sharp <ken.sharp> |
Status: | UNCONFIRMED --- | ||
Severity: | enhancement | ||
Priority: | P4 | ||
Version: | 9.56.1 | ||
Hardware: | PC | ||
OS: | Windows 10 | ||
Customer: | Word Size: | --- | |
Attachments: | a simple File (only "TEST") created with LibreOffice writer and print on HP Laserjet 4350 to file |
Description
Holger
2022-05-09 08:42:03 UTC
The PostScript program uses a CID-Keyed font, which is not supported by the txtwrite device, it only supports type 0 fonts with CID-Keyed descendants. I've made a commit which resolves the recursion, and emits a warning that the font type is not supported before exiting 5527bce8f1c0c6cd62c4a0a19fc511507ae53da9 I'm altering this to an enhancement to support CID-Keyed fonts directly (note to self; steal process_cid_text from gdevpdtc.c). However I should probably mention that even with support for the font type, the text extracted from this document will never be 'Text'. PostScript does not support ToUnicode CMaps, so there is no way to add Unicode information to the font. The Cmap which is used has a custom Ordering and Registry which means we cannot extract any meaning from it. The CIDs do not correspond to ASCII character codes (it's a subset font) and are 2 byte codes anyway. The final result of all that is that there is nothing in the PostScript program which allows us to determine a Unicode code point for the text and so we must fall back on using the character codes, which are not ASCII. I believe the output from this example would be: 0x00 0x37 0x00 0x28 0x00 0x36 0x00 0x37 Treated as UTF16 that would be "7(67" |