Bug 692772 - After generation of PDF/A text no more searchable
Summary: After generation of PDF/A text no more searchable
Status: NOTIFIED WONTFIX
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: 9.04
Hardware: All All
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-05 10:02 UTC by artifex
Modified: 2012-04-16 19:17 UTC (History)
0 users

See Also:
Customer: 870
Word Size: ---


Attachments
DIN.pdf (119.54 KB, application/pdf)
2012-01-05 10:02 UTC, artifex
Details

Note You need to log in before you can comment on or make changes to this bug.
Description artifex 2012-01-05 10:02:04 UTC
Created attachment 8249 [details]
DIN.pdf

When the attached PDF-file DIN.pdf is converted to PDF/A, the text is no more searchable. 

GS-call:

gswin32c.exe -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -o pdfa.pdf -dUseCIEColor -sProcessColorModel=DeviceCMYK -dPDFA PDFA_def.ps DIN.pdf
Comment 1 Ken Sharp 2012-01-05 10:32:51 UTC
The original file includes symbolic TrueType fonts with /Encoding and /Differences arrays, this is contrary to the recommendations of the PDF specification.

There is no other information on the glyphs in the file, no ToUnicode CMap, and the fonts are encoded in a non-standard fashion.

When we create a PDF/A output file we may *NOT* include an Encoding with a symbolic TrueType font, as the specification is quite specific that this is disallowed (see section 6.3.7 of the specification), and various PDF/A validators *will* reject such a file as invalid. In fact, the file you have sent claims to be a PDF/A file but fails validation with Acrobat's preflight tool for this reason (amongst others).

Its true that in the past we did permit this, but precisely because it causes problems we no longer do so.

In the absence of proper glyph information there is no way we can embed a ToUnicode CMap, and since the fonts are encoded in a non-standard way, there is no information for Acrobat to use in order to perform searches.