Bug 690930 - Unicode Output: DEVICE=pdfwrite
Summary: Unicode Output: DEVICE=pdfwrite
Status: NOTIFIED WORKSFORME
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: 8.70
Hardware: PC Windows XP
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-18 02:00 UTC by Katharina Wesselhoeft
Modified: 2011-10-02 02:35 UTC (History)
2 users (show)

See Also:
Customer: 631
Word Size: ---


Attachments
Japanese PDF file (11.21 KB, application/pdf)
2010-05-21 00:28 UTC, Mike
Details
test data (1.29 MB, application/x-zip-compressed)
2011-01-18 18:07 UTC, Katharina Wesselhoeft
Details
output from 9.01 HEAD revision (63.63 KB, application/pdf)
2011-01-18 19:35 UTC, Ken Sharp
Details
Window's anser to our efforts... (53.89 KB, application/zip)
2011-02-04 09:29 UTC, Katharina Wesselhoeft
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Katharina Wesselhoeft 2009-11-18 02:00:05 UTC
Using Ghostsrcipt (GPL Ghostscript 2009-07-31) under Windows XP to create a
PDF-File with Unicode fonts, I ran into a crash:
Unhandled exception at 0x00aba692 in gswin32.exe: 0xC0000005: Access violation
reading location 0xa3fa7c9a.

Ghostscript produces correct output in the window 'ghostscript image' when
working without DEVICE=pdfwrite.

Here the parameters I used:

CIDFMAP:

/FreeSerif
<<
 /FileType /TrueType
 /Path (C:/WINDOWS/fonts/FreeSerif.TTF)
 /SubfontID 0
 /CSI [(Artifex) (Unicode) 0]
>>;

/FreeSerifI
<<
 /FileType /TrueType
 /Path (C:/WINDOWS/fonts/FreeSerifItalic.TTF)
 /SubfontID 0
 /CSI [(Artifex) (Unicode) 0]
>>;

Font FreeSerif from  http://savannah.gnu.org/projects/freefont/

Postscript File:
/FreeSerif-Identity-UTF16-H findfont
10 scalefont setfont
80 800 moveto
<0400045D047D>  show

/FreeSerifI-Identity-UTF16-H findfont
15 scalefont setfont
80 750 moveto
<0400045D047D>  show

Ghostscript Call:
-q
-dBATCH
-dNOPAUSE
-sDEVICE=pdfwrite
-dCompatibilityLevel=1.4
-dEmbedAllFonts=true
-sPDFSETTINGS=prepress
-sAutoRotatePages=PageByPage
-sPAPERSIZE=a4
-sOutputFile="d:\gstools\work\test.pdf"
-I"D:\gstools\"
"D:\gstools\work\test.ps"

Thanks for your efforts in advance. Hope it's not my fault?
Katharina Wesselhoeft
Comment 1 Hin-Tak Leung 2009-11-26 18:37:11 UTC
Documentation says CSI is array of 2 elements, but elsewhere it says setting it
to: [/Artifex /Unicode 0] (which has 3 elements). You used:

/CSI [(Artifex) (Unicode) 0]

Note that (Artifex) is not the same as /Artifex ... I think 
/CSI [(Unicode) 0] is probably the right answer, but in any case the
documentation needs to be updated to resolve the inconsistency.
Comment 2 Ken Sharp 2009-11-27 00:54:16 UTC
Hin-Tak can you point me to the documentation that says its an array of two
elements so I can fix it please ?

The CSI array is *either* a 2 element *or* a 3 element array. If it has two
elements the Registry is assumed to be Adobe, if its a 3 element array then it
must contain Registry, Ordering and Supplement, in that order.

CSI is a shorthand for CIDSystemInfo.
Comment 3 Hin-Tak Leung 2009-11-27 01:38:03 UTC
Oh, in the '3rd party font render section' in Use.htm, and also in .
./Resource/Init/cidfmap
./lib/FAPIcidfmap

In fact this bug report is the first time I saw noticed the 3-element usage.
(relatively new?).

(Artifex) is not the same as /Artifex, I think? (my postscript is a bit rusty...
the former is a string, the latter is a symbol?)
Comment 4 Ken Sharp 2009-11-27 01:51:34 UTC
I don't think the 3-element usage is new to be honest, but I'm not certain. 

While the string (Artifex) is not the same as the name /Artifex its common in
various areas of fonts to treat strings and names the same (ie either is allowed). 

In this case, the Adobe tech note 5014 says that the values in the CIDSystemInfo
dictionary, for the Ordering and Registry keys, must be strings.

The note in FAPIcidfamp is just grammatically odd I think, I suspect it meant to
say 'at least' rather than 'strongly', and the note in cidfmap just documents
the two element case. The documentation in use.htm is just wrong I think. I'll
fix it all up when I get round to looking at this.
Comment 5 Hin-Tak Leung 2009-11-27 02:44:24 UTC
Yes, sorry, the 3-element use was added in r7095 , 3 years ago. 
Comment 6 Katharina Wesselhoeft 2009-12-02 04:12:19 UTC
Thanks for your remarks.
/CSI[(Artifex)(unicode) 0] works fine as long as I didn't change the font for
PDF-output.

/CSI [/artifex /unicode 0] results in
  gs_cidfm.ps 1 --nostringval-- FreeSerif --nostringval-- FilteType TrueType
          Path=C:/Windows/fonts/FreeSerif.TTF SubfontID 0 CSI --nostringval--
Artifex Unicode     

/CSI[(unicod)0] does not accept my font as a CID font and uses a substitute instead.
Comment 7 Hin-Tak Leung 2009-12-02 05:07:46 UTC
That sounds like slightly broken font to me - Does your procedure work with
microsoft's Times instead?

We probably need the exact version of the font you use. If "free" font is
important to you, maybe you can try liberation fonts as an alternative: 
https://fedorahosted.org/releases/l/i/liberation-fonts/
Comment 8 Katharina Wesselhoeft 2009-12-04 02:19:11 UTC
Thanks for your fonts. 

I tried to change betweeen LiberationSans-Regular.ttf LeberationSans-Bold.ttf
and had the same chrash with PDF output. 
This also happened with MsMincho.ttf and ArialUni.ttf from Microsoft.

If I change between fonts without PDF output everything works fine and the
unicode charactes print correctly in the image window.
The crash only occurs when changing fonts with the PDF-output.
Comment 9 Mike 2010-05-21 00:28:11 UTC
Created attachment 6310 [details]
Japanese PDF file
Comment 10 Mike 2010-05-21 00:31:36 UTC
I have a similar problem with GS HEAD:
when trying to convert the attached Japanese PDF into PDF with GS and pdfwrite, i get a gswin32c.exe application error ( Windows 7 x64 ). When converting it to tiff (tiffg3 device) everything works fine.

Here is my cidfmap
/MS-Gothic << /SubfontID 0 /CSI [(Japan1) 3] /Path (c:/Windows/Fonts//msgothic.ttc) /FileType /TrueType >> ;
/MS-PGothic << /SubfontID 1 /CSI [(Japan1) 3] /Path (c:/Windows/Fonts//msgothic.ttc) /FileType /TrueType >> ;
Comment 11 Marcos H. Woehrmann 2010-05-21 21:37:49 UTC
Changing priority to P2 since this is now a customer issue as well.
Comment 12 Ken Sharp 2010-05-25 10:55:20 UTC
When copying a CIDFont with a CIDFontType of 2 (CIDFont with TrueType outlines) for pdfwrite, we copy a pointer to an internally allocated (reference counted) structure. This structure is used for finding substitute for vertical writing mode.

However, the reason pdfwtite copied the font is so that it can emit it later, in case the font should have been restored out of existence before the end of job. This is exactly what is happening in this case. 

The problem is that when the font is restored out of existence, so is the structure it is pointing at. This leaves the copy with a dangling pointer. When the garbage collector runs, if it relocates the copied font it also attempts to relocate the pointer to the structure, if this memory has been reused (which is likely) then the memory subsystem becomes disastrously corrupted.

I'm unable to say if this is the same problem experienced by Katharina Wesselhoeft as I have no sample file for this case. However, the problem would only occur on multiple page files and the symptoms described are consistent with this problem.
Comment 13 Ken Sharp 2010-05-26 12:35:55 UTC
Revision 11321, patch here:

http://ghostscript.com/pipermail/gs-cvs/2010-May/011116.html

resolves the problem with MSMincho and the Japanese PDF file. In the absence of a test file for the FreeSerif font, I'm going to assume it is the same problem and close this issue as fixed. Feel free to reopen if it should turn out not to be fixed, but please supply a sample file as well in this case.
Comment 14 Katharina Wesselhoeft 2011-01-18 18:07:18 UTC
Created attachment 7135 [details]
test data
Comment 15 Katharina Wesselhoeft 2011-01-18 18:12:09 UTC
Sorry to reopen the problem, it still continues on our side.
We attach the ps-file, the parameter file as well as the cidfmap and the font.

Beeing very interested in a quick solution of this problem: How much would you charge us for that?

Thank you for your efforts,
Dietrich Wesselhöft
Comment 16 Ken Sharp 2011-01-18 19:34:28 UTC
(In reply to comment #15)

> Sorry to reopen the problem, it still continues on our side.
> We attach the ps-file, the parameter file as well as the cidfmap and the font.

Can you please be more explicit about the problem ? The path fixes a crash, I have never previously been able to examine your problem.

I get what may or may not be correct output. Its not what I would call 'corrupt', but that doesn't mean its correct. (Of course using a TrueType font as a CIDFont is *not* part of the PostScript specification and is not guaranteed in GS)

I've attached the output I get, if this is not correct please tell me what is wrong with it.

Dropping the priority to P4 as this is no longer a customer problem.
Comment 17 Ken Sharp 2011-01-18 19:35:24 UTC
Created attachment 7137 [details]
output from 9.01 HEAD revision
Comment 18 Katharina Wesselhoeft 2011-02-04 09:26:40 UTC
The output you produced is exactly what we expectet.

Our problem still persists:
We are working under Windows XP service pack 3, having installed gs900w32.exe without source or resource.
We copied our CIDFMAP into the directory LIB.

We produced the parameter file CALL.PAR containing
...
-I "...\gs9.00\lib"
"...psfile.ps"

and called
gswin32.exe @call.par

Windows answeres with a crash with very little information. We will send this nevertheless as separate attachment.

You said that using a TrueType font as a CIDFont may couse problems.
Is there another way to create pdf-files using a TrueType font with the full Unicode character set in UTF16 encoding.

You find the files used as attachments under Comment 14.

Thanks
Katharina Wesselhöft
Comment 19 Katharina Wesselhoeft 2011-02-04 09:29:47 UTC
Created attachment 7196 [details]
Window's anser to our efforts...
Comment 20 Ken Sharp 2011-02-04 14:12:41 UTC
(In reply to comment #18)

> Our problem still persists:
> We are working under Windows XP service pack 3, having installed gs900w32.exe
> without source or resource.
> We copied our CIDFMAP into the directory LIB.

This does GPF on the 9.00 release, it does not on my current HEAD revision of Ghostscript (and obviously didn't back in January). I don't particularly want to go tracking back through 6 months worth of revisions and, since you aren't building from source, it wouldn't help you if I did identify the change which fixed this.

Ghostscript 9.01 is due for release very shortly (hopefully early next week) I would suggest you upgrade when it becomes available for download. Alternatively you can check out the source from the public Subversion repository and build the current version yourselves.

 
> You said that using a TrueType font as a CIDFont may couse problems.
> Is there another way to create pdf-files using a TrueType font with the full
> Unicode character set in UTF16 encoding.

Embed the font as a type 42 font in the PostScript program, this is the only way specified (in the PostScript Language Reference Manual) to use TrueType fonts in a PostScript program.
Comment 21 Katharina Wesselhoeft 2011-02-16 10:50:12 UTC
Version 9.01 really solves our problem.
Thanks for your efforts.
Katharina Wesselhöft
Comment 22 Ken Sharp 2011-02-16 13:17:07 UTC
(In reply to comment #21)
> Version 9.01 really solves our problem.
> Thanks for your efforts.

Thanks for letting us know, I appreciate it !