Bug 691605

Summary: Invisible text not preserved by pdfwrite
Product: Ghostscript Reporter: Marcos H. Woehrmann <marcos.woehrmann>
Component: PDF InterpreterAssignee: Ken Sharp <ken.sharp>
Status: NOTIFIED FIXED    
Severity: enhancement    
Priority: P4    
Version: master   
Hardware: PC   
OS: All   
Customer: potential Word Size: ---

Description Marcos H. Woehrmann 2010-09-09 18:05:10 UTC
From Ken's email:

This is a scanned and OCR'ed PDF file which has invisible (makes no marks) text laid on top of it. You can search for the invisible text, which lies 'more or less' over the matching area of the scan.

The PDF interpreter drops this text (it makes no marks!) so it never reaches pdfwrite, so it isn't present in the output. While we obviously could do something about this, it certainly isn't a priority.
Comment 2 Alex Cherepanov 2010-09-10 01:10:54 UTC
PostScript part of the PDF interpreter passes the text and the rendering
mode to C level. This can be easily checked by changing
gs_settextrenderingmode() to no-op.
Comment 3 Ray Johnston 2010-09-10 14:57:57 UTC
Reassigning to Ken since it seems that the PS part of the PDF interp does pass
things in. If this is FAPI dropping the ball, please pass along to Chris.
Comment 4 Ken Sharp 2010-10-08 12:55:26 UTC
Bizarrely, some of the text is indeed written to the output file (eg '484,20'), but most of it is not, no idea why at the moment.
Comment 5 Ken Sharp 2010-10-08 13:49:45 UTC
This is caused by the FontDescriptor for the text in rendering mode 3 having Widths which do not match the widths in the font. When this happens the code takes a path which does not add the glyphs to the 'used' list, so they don't get emitted.

I have a fix in regression test which I believe will address this.
Comment 6 Ken Sharp 2010-10-08 14:43:12 UTC
Fixed in revision 11779, patch here:

http://ghostscript.com/pipermail/gs-cvs/2010-October/011807.html