From Ken's email: This is a scanned and OCR'ed PDF file which has invisible (makes no marks) text laid on top of it. You can search for the invisible text, which lies 'more or less' over the matching area of the scan. The PDF interpreter drops this text (it makes no marks!) so it never reaches pdfwrite, so it isn't present in the output. While we obviously could do something about this, it certainly isn't a priority.
PostScript part of the PDF interpreter passes the text and the rendering mode to C level. This can be easily checked by changing gs_settextrenderingmode() to no-op.
Reassigning to Ken since it seems that the PS part of the PDF interp does pass things in. If this is FAPI dropping the ball, please pass along to Chris.
Bizarrely, some of the text is indeed written to the output file (eg '484,20'), but most of it is not, no idea why at the moment.
This is caused by the FontDescriptor for the text in rendering mode 3 having Widths which do not match the widths in the font. When this happens the code takes a path which does not add the glyphs to the 'used' list, so they don't get emitted. I have a fix in regression test which I believe will address this.
Fixed in revision 11779, patch here: http://ghostscript.com/pipermail/gs-cvs/2010-October/011807.html