Created attachment 22505 [details] a simple File (only "TEST") created with LibreOffice writer and print on HP Laserjet 4350 to file i create some PS Files with Windows Server 2019 an Printer Driver HP Laserjet 4350 (or some other PS Printer driver) and try to extract Text Information from this files with the following command gswin64c.exe -dNOPAUSE -dBATCH -sDEVICE=txtwrite -sOutputFile="c:\temp\embtxt1.txt.%d" "C:\temp\test.ps" but it reports *** C stack overflow. Quiting i have tried to debug ghostscript and it loops the functions > gsdll64.dll!textw_text_resync(gs_text_enum_s * pte, const gs_text_enum_s * pfrom) Zeile 1957 C gsdll64.dll!gs_text_resync(gs_text_enum_s * pte, const gs_text_enum_s * pfrom) Zeile 690 C gsdll64.dll!textw_text_resync(gs_text_enum_s * pte, const gs_text_enum_s * pfrom) Zeile 1958 C gsdll64.dll!gs_text_resync(gs_text_enum_s * pte, const gs_text_enum_s * pfrom) Zeile 690 C gsdll64.dll!textw_text_resync(gs_text_enum_s * pte, const gs_text_enum_s * pfrom) Zeile 1958 C gsdll64.dll!gs_text_resync(gs_text_enum_s * pte, const gs_text_enum_s * pfrom) Zeile 690
The PostScript program uses a CID-Keyed font, which is not supported by the txtwrite device, it only supports type 0 fonts with CID-Keyed descendants. I've made a commit which resolves the recursion, and emits a warning that the font type is not supported before exiting 5527bce8f1c0c6cd62c4a0a19fc511507ae53da9 I'm altering this to an enhancement to support CID-Keyed fonts directly (note to self; steal process_cid_text from gdevpdtc.c). However I should probably mention that even with support for the font type, the text extracted from this document will never be 'Text'. PostScript does not support ToUnicode CMaps, so there is no way to add Unicode information to the font. The Cmap which is used has a custom Ordering and Registry which means we cannot extract any meaning from it. The CIDs do not correspond to ASCII character codes (it's a subset font) and are 2 byte codes anyway. The final result of all that is that there is nothing in the PostScript program which allows us to determine a Unicode code point for the text and so we must fall back on using the character codes, which are not ASCII. I believe the output from this example would be: 0x00 0x37 0x00 0x28 0x00 0x36 0x00 0x37 Treated as UTF16 that would be "7(67"