Bug 691605

Summary:	Invisible text not preserved by pdfwrite
Product:	Ghostscript	Reporter:	Marcos H. Woehrmann <marcos.woehrmann>
Component:	PDF Interpreter	Assignee:	Ken Sharp <ken.sharp>
Status:	NOTIFIED FIXED
Severity:	enhancement
Priority:	P4
Version:	master
Hardware:	PC
OS:	All
Customer:	potential	Word Size:	---

Description Marcos H. Woehrmann 2010-09-09 18:05:10 UTC

From Ken's email:

This is a scanned and OCR'ed PDF file which has invisible (makes no marks) text laid on top of it. You can search for the invisible text, which lies 'more or less' over the matching area of the scan.

The PDF interpreter drops this text (it makes no marks!) so it never reaches pdfwrite, so it isn't present in the output. While we obviously could do something about this, it certainly isn't a priority.

Comment 2 Alex Cherepanov 2010-09-10 01:10:54 UTC

PostScript part of the PDF interpreter passes the text and the rendering
mode to C level. This can be easily checked by changing
gs_settextrenderingmode() to no-op.

Comment 3 Ray Johnston 2010-09-10 14:57:57 UTC

Reassigning to Ken since it seems that the PS part of the PDF interp does pass
things in. If this is FAPI dropping the ball, please pass along to Chris.

Comment 4 Ken Sharp 2010-10-08 12:55:26 UTC

Bizarrely, some of the text is indeed written to the output file (eg '484,20'), but most of it is not, no idea why at the moment.

Comment 5 Ken Sharp 2010-10-08 13:49:45 UTC

This is caused by the FontDescriptor for the text in rendering mode 3 having Widths which do not match the widths in the font. When this happens the code takes a path which does not add the glyphs to the 'used' list, so they don't get emitted.

I have a fix in regression test which I believe will address this.

Comment 6 Ken Sharp 2010-10-08 14:43:12 UTC

Fixed in revision 11779, patch here:

http://ghostscript.com/pipermail/gs-cvs/2010-October/011807.html