Bug 692073 - Japanese text can not be copied from PDF
Summary: Japanese text can not be copied from PDF
Status: NOTIFIED DUPLICATE of bug 691862
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: General (show other bugs)
Version: 9.00
Hardware: PC Windows XP
: P4 normal
Assignee: Default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-16 03:31 UTC by wilkinsonAU
Modified: 2011-04-13 00:39 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
sample pdfs created with GS 9.0.0 and 8.7.1 (23.45 KB, application/x-zip-compressed)
2011-03-16 03:31 UTC, wilkinsonAU
Details
postscript file (56.78 KB, application/x-zip-compressed)
2011-03-17 01:22 UTC, wilkinsonAU
Details

Note You need to log in before you can comment on or make changes to this bug.
Description wilkinsonAU 2011-03-16 03:31:01 UTC
Created attachment 7373 [details]
sample pdfs created with GS 9.0.0 and 8.7.1

Overview:
The text in Japanese pdf created by pdfCreator (pdfforge) cannot be copied if GS 9.00 is used. Text copies correctly if GS8.71 is used.

Steps to reproduce:
1) On a Japanese Windows XP OS, install pdfCreator 1.20 which is bundled with GS 9.00

2) Print a japanese text document from Notepad to the pdfCreator printer.

3) Open the resulting PDF and attempt to copy text and paste into another document. 

Expected Result:
The pasted text should be the same as the text in the PDF.

Actual Result:
The pasted text is garbled. Copy attempted in both Foxit Reader 4.0 and Adobe Acrobat Reader X.

Additional Info:
The problem does not occur if GS 8.71 is used.
The following two tests produce the expected result:

a) Install Ghostscript 8.71 and reconfigure pdfCreator options to use this instead of the bundled 9.0.0. Repeat steps 1-3 and text can be copied correctly out of the resulting PDF.

b) Downgrade to pdfCreator 1.0.2 (with internally bundled GS 8.71). Repeat steps 1-3 and text can copied correctly out of the resulting PDF.
Comment 1 Ken Sharp 2011-03-16 08:08:46 UTC
Please supply the PostScript files used to create the PDF, and ideally the Ghostscript command line used to create the PDF files.
Comment 2 wilkinsonAU 2011-03-17 01:21:14 UTC
logfile showing ghostscript parameters when 8.71 is used:

2011/03/17 12:12:39: Ghostscriptparameter:
-IC:\Program Files\gs\gs8.71\lib\
-q
-dNOPAUSE
-dBATCH
-sFONTPATH=C:\WINDOWS\Fonts
-sDEVICE=pdfwrite
-dPDFSETTINGS=/default
-dCompatibilityLevel=1.4
-r600x600
-dProcessColorModel=/DeviceCMYK
-dAutoRotatePages=/PageByPage
-dCompressPages=true
-dEmbedAllFonts=true
-dSubsetFonts=true
-dMaxSubsetPct=100
-dConvertCMYKImagesToRGB=false
-sOutputFile=C:\Temp\loremJPN.txt -.pdf
-dEncodeColorImages=true
-dAutoFilterColorImages=true
-dEncodeGrayImages=true
-dAutoFilterGrayImages=true
-dEncodeMonoImages=true
-dMonoImageFilter=/CCITTFaxEncode
-dDownsampleMonoImages=false
-dPreserveOverprintSettings=true
-dUCRandBGInfo=/Preserve
-dUseFlateCompression=true
-dParseDSCCommentsForDocInfo=true
-dParseDSCComments=true
-dOPM=0
-dOffOptimizations=0
-dLockDistillerParams=false
-dGrayImageDepth=-1
-dASCII85EncodePages=false
-dDefaultRenderingIntent=/Default
-dTransferFunctionInfo=/Preserve
-dPreserveHalftoneInfo=false
-dDetectBlends=true
-f
C:\Program Files\PDFCreator\Temp\PDFCreatorSpool\~PS1E.tmp
2011/03/17 12:12:39: Time for converting [PDF without encryption]: 00:00:00:319
Comment 3 wilkinsonAU 2011-03-17 01:21:55 UTC
logfile showing ghostscript parameters when 9.00 is used:

2011/03/17 12:13:37: Ghostscriptparameter:
-IC:\Program Files\PDFCreator\GS9.00\gs9.00\Lib\
-q
-dNOPAUSE
-dBATCH
-sFONTPATH=C:\WINDOWS\Fonts
-sDEVICE=pdfwrite
-dPDFSETTINGS=/default
-dCompatibilityLevel=1.4
-r600x600
-dProcessColorModel=/DeviceCMYK
-dAutoRotatePages=/PageByPage
-dCompressPages=true
-dEmbedAllFonts=true
-dSubsetFonts=true
-dMaxSubsetPct=100
-dConvertCMYKImagesToRGB=false
-sOutputFile=C:\Temp\loremJPN.txt -.pdf
-dEncodeColorImages=true
-dAutoFilterColorImages=true
-dEncodeGrayImages=true
-dAutoFilterGrayImages=true
-dEncodeMonoImages=true
-dMonoImageFilter=/CCITTFaxEncode
-dDownsampleMonoImages=false
-dPreserveOverprintSettings=true
-dUCRandBGInfo=/Preserve
-dUseFlateCompression=true
-dParseDSCCommentsForDocInfo=true
-dParseDSCComments=true
-dOPM=0
-dOffOptimizations=0
-dLockDistillerParams=false
-dGrayImageDepth=-1
-dASCII85EncodePages=false
-dDefaultRenderingIntent=/Default
-dTransferFunctionInfo=/Preserve
-dPreserveHalftoneInfo=false
-dDetectBlends=true
-f
C:\Program Files\PDFCreator\Temp\PDFCreatorSpool\~PS24.tmp
2011/03/17 12:13:38: Time for converting [PDF without encryption]: 00:00:00:327
Comment 4 wilkinsonAU 2011-03-17 01:22:42 UTC
Created attachment 7381 [details]
postscript file
Comment 5 Ken Sharp 2011-03-17 08:37:43 UTC
The 9.0 release wrote ToUnicode CMaps which Acrobat doesn't like. The TOUnicode CMap is used to get Unicode code points for the text in the document, and it is this which is used for cut and paste.

This was fixed a couple of months ago (4th January), and the current code produces acceptable ToUnicode CMaps, where the text can be cut and paste. I would recommend updating to the current release (9.01) or building the source in Subversion if you want to try the cutting edge code.

*** This bug has been marked as a duplicate of bug 691862 ***
Comment 6 wilkinsonAU 2011-04-13 00:39:12 UTC
tested ok in GS 9.02, thanks for your assistance.