Bug 691605 - Invisible text not preserved by pdfwrite
Summary: Invisible text not preserved by pdfwrite
Status: NOTIFIED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Interpreter (show other bugs)
Version: master
Hardware: PC All
: P4 enhancement
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-09-09 18:05 UTC by Marcos H. Woehrmann
Modified: 2011-10-02 02:35 UTC (History)
0 users

See Also:
Customer: potential
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2010-09-09 18:05:10 UTC
From Ken's email:

This is a scanned and OCR'ed PDF file which has invisible (makes no marks) text laid on top of it. You can search for the invisible text, which lies 'more or less' over the matching area of the scan.

The PDF interpreter drops this text (it makes no marks!) so it never reaches pdfwrite, so it isn't present in the output. While we obviously could do something about this, it certainly isn't a priority.
Comment 2 Alex Cherepanov 2010-09-10 01:10:54 UTC
PostScript part of the PDF interpreter passes the text and the rendering
mode to C level. This can be easily checked by changing
gs_settextrenderingmode() to no-op.
Comment 3 Ray Johnston 2010-09-10 14:57:57 UTC
Reassigning to Ken since it seems that the PS part of the PDF interp does pass
things in. If this is FAPI dropping the ball, please pass along to Chris.
Comment 4 Ken Sharp 2010-10-08 12:55:26 UTC
Bizarrely, some of the text is indeed written to the output file (eg '484,20'), but most of it is not, no idea why at the moment.
Comment 5 Ken Sharp 2010-10-08 13:49:45 UTC
This is caused by the FontDescriptor for the text in rendering mode 3 having Widths which do not match the widths in the font. When this happens the code takes a path which does not add the glyphs to the 'used' list, so they don't get emitted.

I have a fix in regression test which I believe will address this.
Comment 6 Ken Sharp 2010-10-08 14:43:12 UTC
Fixed in revision 11779, patch here:

http://ghostscript.com/pipermail/gs-cvs/2010-October/011807.html