Created attachment 9597 [details] xps file from printing text_graphics_image.pdf to an MXDW printer. Some users reported that pdf files produced by gxps were producing incorrect text strings in the copy buffer when copied from Acrobat. There are similar issues on the gs forums over the last few years marked resolved or invalid and it may well be a gs issue but I will report it here to put the ball in play. It was verified using gxps 9.07 compiled with VS2008 on Win7, as follows: Starting with the pdf example file distributed with gs in the tools folder (text_graphics_image.pdf) An xps file was created with the MXDW printer on Win7 - text_graphics_image.xps (attached). Converting this with gxps created a pdf file which had the problem Dumping out the Unicode maps from both files showed a problem. Original: /CMapName /Adobe-Identity-UCS def /CMapType 2 def 1 begincodespacerange <0000> <FFFF> endcodespacerange 19 beginbfchar <20> <0020> <41> <0041> <42> <0042> <43> <0043> <45> <0045> <47> <0047> <49> <0049> <4C> <004C> <4E> <004E> <52> <0052> <61> <0061> <63> <0063> <65> <0065> <6B> <006B> <6C> <006C> <72> <0072> <74> <0074> <75> <0075> <79> <0079> endbfchar endcmap CMapName currentdict /CMap defineresource pop end end From gxps: /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CMapType 2 def /CMapName/R16 def 1 begincodespacerange <00><ff> endcodespacerange 19 beginbfrange <20><20><0020> <41><41><0042> <42><42><0020> <43><43><0042> <45><45><0042> <47><47><0020> <49><49><0042> <4c><4c><0042> <4e><4e><0020> <52><52><0020> <61><61><0020> <63><63><0020> <65><65><0020> <6b><6b><0020> <6c><6c><0020> <72><72><0020> <74><74><0020> <75><75><0020> <79><79><0020> endbfrange endcmap CMapName currentdict /CMap defineresource pop end end A recompile was done without /DWINDOWS_NO_UNICODE but as expected that had no effect on the result. The following patch was applied to line 168+ in gdevpsfm.c to verify that this was the root cause of the copy bug <snip> case CODE_VALUE_CHARS: stream_putc(s, '<'); value = *lenum.entry.key[0]<<8; pput_hex(s, &value, value_size); // PJM test for corrupt Unicode map, reconstruct the unicode from the char key //pput_hex(s, lenum.entry.value.data, value_size); stream_putc(s, '>'); </snip> This did produce an output file which copied correctly from Acrobat. The map entry.value.data is wrong, perhaps from the input parsing.
This isn't strictly a pdfwrite bug, and there isn't a way to deal with it in pdfwrite alone. In order to generate a ToUnicode CMap (not the same thing as a regular CMap, which is what is quoted in comment #0 as being in the original PDF) we need the Unicode code point relevant to a given glyph or CID. This is normally provided by a callback to the interpreter, the callback is stored in the font structure built by the interpreter. The callback in question is the 'decode_glyph' proc, in the case of this font it ends up in xps_true_callback_decode_glyph which *should* return the Unicode value. However, it does not do so, and simply returns 'xps_last_char'. I'm not sure what that is, but its not the Unicode code point for the glyph. The routine has this comment: /* We should do a reverse cmap lookup here to match PS/PDF. * However, a complete rearchitecture of our text and font processing * would be necessary to match XPS unicode mapping with the * cluster maps. Alas, we cheat similarly to PCL. */ While its understandable that PCL is unable to return this information, since it isn't present in a PCL file, it seems that it should be possible to return it from an XPS file if it has a UnicodeString attribute. So tossing this one back to Tor as its really an XPS interpreter problem.
Created attachment 9598 [details] possible patch The problem occurs because we call xps_true_callback_encode_char once for each glyph in a string. We then call xps_true_callback_decode_glyph for each glyph in turn. Because xps_last_char is always the last glyph in the buffer, we get the same values. While not correct, the simple change attached resolves the problem.
Is this related to http://bugs.ghostscript.com/show_bug.cgi?id=692395 or http://bugs.ghostscript.com/show_bug.cgi?id=693031 ? I've also noticed that running latest gxps on windows with txtwrite will crash gxps.
*** This bug has been marked as a duplicate of bug 692395 ***