Bug 687954

Summary: copy and pasting asian text problems
Product: Ghostscript Reporter: Jack Moffitt <jack>
Component: PDF WriterAssignee: Igor Melichev <igor.melichev>
Status: NOTIFIED FIXED    
Severity: normal    
Priority: P2    
Version: master   
Hardware: All   
OS: All   
Customer: 242 Word Size: ---

Description Jack Moffitt 2005-02-24 09:44:32 UTC
When copying and pasting text from distillation of the attached file, some of
the glyphs don't seem to be copied correctly.  Tested with 8.50 and CVS HEAD.
Comment 1 Jack Moffitt 2005-02-24 09:48:38 UTC
Created attachment 1221 [details]
text.ps
Comment 2 Igor Melichev 2005-03-03 01:38:30 UTC
The problem happens due to "PScript5.dll Version 5.2", which originally created 
the test file. In the table GlyphNames2Unicode (which is formed in the document 
sections G2UBegin - G2UEnd) it specifies character codes instead glyph names 
(which should be CIDs with a CID font). Hoewver in the test document CIDs are 
not equal to character codes - see the embedded CMap named WinCharSetFFFF-H2.

Since GlyphNames2Unicode is an undocumented feature, which PScript5.dll uses to 
communicate with Adobe Distiller, we guess that Microsoft named it 
inaccurately, so that our old reconstruction of its semantics appears not 
enough correct. We could patch Ghostscript gdevpdtc.c ln 388 with 

  unicode_char = subfont->procs.decode_glyph(subfont, chr);

But we do not know what consequences it will cause in general and we have no 
technology for testing possible regressions.

For now, we'll apply the patch, and will see the consequences.
Comment 3 Igor Melichev 2005-03-03 05:33:33 UTC
The fix explained in the last comment uppears insufficient.
Here is a complete one :
Patch to HEAD :
http://ghostscript.com/pipermail/gs-cvs/2005-March/005263.html
Patch to GS_8_1X :
http://ghostscript.com/pipermail/gs-cvs/2005-March/005264.html

Note that the test document prints each character several times to emulate an 
enboldness. Ghostscript/pdfwrite has no workaround against it. Therefore 
copy&paste multiplies text in 3 times. That is not a part of the current bug.