687954 – copy and pasting asian text problems

Bug 687954 - copy and pasting asian text problems

Summary: copy and pasting asian text problems

Status:	NOTIFIED FIXED

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	PDF Writer (show other bugs)
Version:	master
Hardware:	All All

Importance:	P2 normal
Assignee:	Igor Melichev

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-02-24 09:44 UTC by Jack Moffitt
Modified:	2008-12-19 08:31 UTC (History)
CC List:	0 users

See Also:
Customer:	242
Word Size:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jack Moffitt 2005-02-24 09:44:32 UTC

When copying and pasting text from distillation of the attached file, some of
the glyphs don't seem to be copied correctly.  Tested with 8.50 and CVS HEAD.

Comment 1 Jack Moffitt 2005-02-24 09:48:38 UTC

Created attachment 1221 [details]
text.ps

Comment 2 Igor Melichev 2005-03-03 01:38:30 UTC

The problem happens due to "PScript5.dll Version 5.2", which originally created 
the test file. In the table GlyphNames2Unicode (which is formed in the document 
sections G2UBegin - G2UEnd) it specifies character codes instead glyph names 
(which should be CIDs with a CID font). Hoewver in the test document CIDs are 
not equal to character codes - see the embedded CMap named WinCharSetFFFF-H2.

Since GlyphNames2Unicode is an undocumented feature, which PScript5.dll uses to 
communicate with Adobe Distiller, we guess that Microsoft named it 
inaccurately, so that our old reconstruction of its semantics appears not 
enough correct. We could patch Ghostscript gdevpdtc.c ln 388 with 

  unicode_char = subfont->procs.decode_glyph(subfont, chr);

But we do not know what consequences it will cause in general and we have no 
technology for testing possible regressions.

For now, we'll apply the patch, and will see the consequences.

Comment 3 Igor Melichev 2005-03-03 05:33:33 UTC

The fix explained in the last comment uppears insufficient.
Here is a complete one :
Patch to HEAD :
http://ghostscript.com/pipermail/gs-cvs/2005-March/005263.html
Patch to GS_8_1X :
http://ghostscript.com/pipermail/gs-cvs/2005-March/005264.html

Note that the test document prints each character several times to emulate an 
enboldness. Ghostscript/pdfwrite has no workaround against it. Therefore 
copy&paste multiplies text in 3 times. That is not a part of the current bug.