Bug 690764 - VISCII fonts
Summary: VISCII fonts
Status: RESOLVED INVALID
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: 8.65
Hardware: PC Windows 2000
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-09-13 07:05 UTC by Yimin Liu
Modified: 2010-09-21 09:49 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
Acrobat_distiller.pdf (34.26 KB, application/pdf)
2009-09-16 07:26 UTC, Yimin Liu
Details
ghostscript.pdf (27.08 KB, application/pdf)
2009-09-16 07:27 UTC, Yimin Liu
Details
partial font decode (34.28 KB, text/plain)
2010-09-21 09:49 UTC, Ken Sharp
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yimin Liu 2009-09-13 07:05:02 UTC
I installed the VISCII-encoded Vietnamese font (UHoai-1.1) (from
http://sourceforge.net/projects/winvnkey/files/) and tried to generate PDF file.
when I run ps2pdf:

18 720 moveto
%/VU-Phu-Tho findfont 16 scalefont setfont
/UHoai-1.1 findfont 16 scalefont setfont
(\230) show

the PDF file generated shows nothing. It should show the character 'Ị'. It works
with other vietnamese characters (such as \377). I tried a couple of other
VISCII fonts and had the same results.
Comment 1 Alex Cherepanov 2009-09-13 18:41:52 UTC
Please provide more information how did you install the fonts.
Ghostscript can use TTF fonts by searching the font directory
but font names of the fonts from visfont0.zip are different from
the names in your sample file.

I've downloaded and decompressed visfont0.zip and run the following program

%!
/s  ( ) def
0 1 255 {
/Courier findfont 30 scalefont setfont
dup =string cvs 280 770 moveto show
s 0 3 -1 roll put
0
{
/HeoMay11
/HeoMayHoa11
/HoangYen11
/HoangYenH11
/MinhQun11
/MinhQunH11
/PhuongThao11
/PhuongThaoH11
/ThaHuong11
/ThaHuongH11
/UHoi11
/UHoiH11
/VIArial
/VIArialH
/VIAvan
/VIAvanH
/VIBook
/VIBookH
/VITimes
/VITimesH
/VIUniverse
/VIUniverseH
/nhMinh11
/nhMinhH11
} {
  findfont 30 scalefont setfont
  dup 4 mod   110 mul  80 add
  1 index 4 idiv -100 mul 660 add  moveto
  1 add
  s show
} forall
pop
showpage
} for

using these command lines
gswin32c -sFONTPATH=. foo.ps
gswin32c -o foo.pdf -sDEVICE=pdfwrite -sFONTPATH=. foo.ps

First, image rendered on the screen and and the content of PDF file
look identical.

Second, there are differences between the fonts. I don't how which font
is right and which font is wrong. Most likely, the errors (if any) are in
the fonts rather than in Ghostscript.

Comment 2 Yimin Liu 2009-09-16 07:26:30 UTC
Created attachment 5376 [details]
Acrobat_distiller.pdf
Comment 3 Yimin Liu 2009-09-16 07:27:01 UTC
Created attachment 5377 [details]
ghostscript.pdf
Comment 4 Yimin Liu 2009-09-16 07:29:50 UTC
Hi Alex,

Thank you very much for your reply.

For the font VIArial, I used the Acrobat distiller to get the pdf file:
    acrobat_distiller.pdf (see attachment),
please compare it with ghostscript-generated pdf file:
    ghostscript.pdf

You can see the Acrobat distiller is able to get the character '\230'.

To view VISCII code page layout:
http://en.wikipedia.org/wiki/Vietnamese_Standard_Code_for_Information_Inter=
change

My PS script:
----------------------------------------
/yline 690 def
/xstart 18 def
0 1 255
  { /counter exch def
    /charstring showstring dup 0 counter put def
    /Times-Roman findfont 8 scalefont setfont  xstart yline moveto
    counter 8 counterstring cvrs show
    xstart 42 add yline moveto
    charstring show
    /VIArial findfont 10 scalefont setfont xstart 70 add yline moveto
    charstring show
    /yline yline 10 sub def
    counter 1 add 64 mod 0 eq
      { /xstart xstart 133 add def
        /yline 690 def
      } if
  } for
showpage
---------------------------------

Yimin Liu
Comment 5 Ken Sharp 2010-09-21 09:49:03 UTC
Created attachment 6740 [details]
partial font decode

I do not believe there is a real bug here. Before using a font in PostScript it is necessary to first encode it, so that character codes map to the expected glyphs. If you know in advance that a font is already encoded in a satisfactory fashion, then this need not be done (eg the font has a /StandardEncoding).

However there is no specification for using TrueType fonts as replacements for PostScript fonts (or use TrueType fonts natively in PostScript), since this is not a feature of standard PostScript. So the first thing that should be done is to encode the font suitably, this is not being done and is the basic reason for the problem.

Ghostscript makes a 'best effort' attempt with these fonts. Taking VIArial as an example. We decode the font's CMAP tables, and choose to use the Unicode CMAP (its the best one of the three for us). Because PostScript only allows 256 glyphs in an Encoding, we can't use any positions in the CMAP that are beyond 0xFF, but its a good start.

In the attached file (tables.txt) is a breakdown of part of the font, this shows that, according to the Unicode CMAP subtable, position 0x98 is not encoded, and so we place a /.notdef in this position. Hence when character code \230 (0x98) is used, we render a /.notdef glyph. 

This all appears to be correct and as expected.

Moving on to the Acrobat output. Acrobat seems to be trying to encode more of the glyphs. I have no idea what criteria it is using for this, as it seems to have altered the glyph names when creating the embedded fonts. In addition, when compared to the encoding given in the Wikipedia article in comment #4 I can see that there are a number of glyphs not encoded, or not encoded in the correct position, in the Acrobat output.

Checking the Acrobat output against the Wikipedia article we see that positions 2, 5 and 6 should have glyphs. The Ghostscript output does (though position 2 makes no marks), the Distiller output does not. Position 0x80 is not encoded in the Distiller output but is correctly encoded in the GS output. And so on.

In short *neither* output matches that given in the Wikipedia article and therefore the Ghostscript output is not only not incorrect, its not even sub-optimal compared to Acrobat.

This is not a bug, if you want to use these fonts in PostScript you must correctly encode them for your intended encoding scheme before you attempt to use them.