Bug 689304 - improper handling of vertical japanese text
Summary: improper handling of vertical japanese text
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Graphics Library (show other bugs)
Version: master
Hardware: PC Linux
: P1 normal
Assignee: leonardo
URL:
Keywords:
Depends on: 689404 689405 689559 689893
Blocks:
  Show dependency tree
 
Reported: 2007-06-25 17:08 UTC by Ralph Giles
Modified: 2009-01-08 04:51 UTC (History)
4 users (show)

See Also:
Customer:
Word Size: ---


Attachments
PS file including vertical text (2.62 KB, application/postscript)
2007-06-25 17:09 UTC, Ralph Giles
Details
Scanned image of Japanese PS printer output (463.46 KB, application/pdf)
2007-06-25 17:11 UTC, Ralph Giles
Details
Ghostscript 7.07 output (45.41 KB, image/png)
2007-06-25 17:12 UTC, Ralph Giles
Details
ESP Ghostscript 8.15.1 without CJK patch (44.57 KB, image/png)
2007-06-25 17:15 UTC, Ralph Giles
Details
ESP Ghostscript 8.15.1 with CJK patch (44.89 KB, image/png)
2007-06-25 17:17 UTC, Ralph Giles
Details
Current trunk output (43.97 KB, image/png)
2007-06-25 17:19 UTC, Ralph Giles
Details
cidfmap for the fonts referenced by article9.ps (1.79 KB, text/plain)
2007-06-25 19:55 UTC, Ralph Giles
Details
Add support for the %% Replace keywords used with COMPILE_INITS=1 (1.48 KB, patch)
2007-08-14 13:59 UTC, Ralph Giles
Details | Diff
CJK patches from Koji Otani, part 1 (112.70 KB, patch)
2007-08-14 14:02 UTC, Till Kamppeter
Details | Diff
CJK patches from Koji Otani, part 2 (10.60 KB, patch)
2007-08-14 14:03 UTC, Till Kamppeter
Details | Diff
CJK patches from Koji Otani, part 3 (98.53 KB, patch)
2007-08-14 14:04 UTC, Till Kamppeter
Details | Diff
CJK patches from Koji Otani, part 4 (436 bytes, patch)
2007-08-14 14:04 UTC, Till Kamppeter
Details | Diff
screenshot.png (323.66 KB, image/png)
2008-09-15 14:48 UTC, Marcos H. Woehrmann
Details
cidfmap (888 bytes, text/plain)
2008-09-15 15:35 UTC, Marcos H. Woehrmann
Details
patch.txt (15.12 KB, patch)
2008-10-08 00:38 UTC, leonardo
Details | Diff
PDF including punctuations to be rotated (5.38 KB, application/pdf)
2008-10-08 01:53 UTC, mpsuzuki
Details
test.png (3.58 KB, image/png)
2008-10-13 21:08 UTC, Marcos H. Woehrmann
Details
patch4.txt (3.73 KB, patch)
2009-01-08 03:33 UTC, leonardo
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Ralph Giles 2007-06-25 17:08:24 UTC
Ghostscript shows unrotated punctuation forms with Japanese vertical text. It
also does not use a correct centerline for vertical text, a regression from 7.07
and ESP Ghostscript 8.15.1.

Originally reported by mpsuzuki in
http://ghostscript.com/pipermail/gs-devel/2007-June/003538.html
Comment 1 Ralph Giles 2007-06-25 17:09:41 UTC
Created attachment 3072 [details]
PS file including vertical text

Test file. Requires an appropriate cidfmap for Japanese text.
Comment 2 Ralph Giles 2007-06-25 17:11:17 UTC
Created attachment 3073 [details]
Scanned image of Japanese PS printer output

Correct output for reference.
Comment 3 Ralph Giles 2007-06-25 17:12:30 UTC
Created attachment 3074 [details]
Ghostscript 7.07 output
Comment 4 Ralph Giles 2007-06-25 17:15:58 UTC
Created attachment 3075 [details]
ESP Ghostscript 8.15.1 without CJK patch

Vanilla ESP Ghostscript 8.15.1 output showing unrotated punctuation, a
regression from 7.07.
Comment 5 Ralph Giles 2007-06-25 17:17:29 UTC
Created attachment 3076 [details]
ESP Ghostscript 8.15.1 with CJK patch

Output from ESP Ghostscript 8.15.1 with a proposed patch. This shows correctly
rotated punctuation, but also a slight vertical offset.
Comment 6 Ralph Giles 2007-06-25 17:19:49 UTC
Created attachment 3077 [details]
Current trunk output

Output with current development HEAD version, showing unrotated punctuation and
centerline shift problems relative to 7.07 and the reference output.
Comment 7 Ralph Giles 2007-06-25 19:55:55 UTC
Created attachment 3079 [details]
cidfmap for the fonts referenced by article9.ps

Attaching the cidfmap file the original submitter used. This provides a
substitution map from the fonts referenced by article9.ps to the IPA set, which
is available bundled with the GRASS software package, and included in this
archive:

http://www.grass-japan.org/FOSS4G/ipafonts/grass5.0.3_i686-pc-linux-i18n-ipafull-gnu_bin.tar.gz


The absolute paths will of course have to be altered to match where they are in
any particular install.
Comment 8 Ray Johnston 2007-06-26 09:59:29 UTC
Assigning to the developer of the GS 8 CJK support.
Comment 9 Till Kamppeter 2007-08-14 13:59:31 UTC
Patches were proposed by Koji Otani, but they raised bug 689404 and bug 689405.
The patches were applied with rev 8187 and due to the regressions removed again
in rev 8190.

For further studies I attach the patches to this bug report.
Comment 10 Ralph Giles 2007-08-14 13:59:40 UTC
Created attachment 3280 [details]
Add support for the %% Replace keywords used with COMPILE_INITS=1

Here is an additional patch on top of r8187 that replaces the EXTRA_ hack with
the proper %% Replace directives. This allows geninit.c to automatically
include the files when compiling in the ps library, stripping comments and
making other space-saving reductions.

Please include this patch when reinstating the cjkv patch.
Comment 11 Till Kamppeter 2007-08-14 14:02:22 UTC
Created attachment 3281 [details]
CJK patches from Koji Otani, part 1
Comment 12 Till Kamppeter 2007-08-14 14:03:10 UTC
Created attachment 3282 [details]
CJK patches from Koji Otani, part 2
Comment 13 Till Kamppeter 2007-08-14 14:04:14 UTC
Created attachment 3283 [details]
CJK patches from Koji Otani, part 3
Comment 14 Till Kamppeter 2007-08-14 14:04:55 UTC
Created attachment 3284 [details]
CJK patches from Koji Otani, part 4
Comment 15 Till Kamppeter 2007-08-14 14:33:12 UTC
Additional note: All of the CJK patches is reverted in rev 8191 not in 8190.
Comment 16 mpsuzuki 2007-08-30 19:19:21 UTC
The regressions caused by Koji Otani's patch:
bug 689404 and bug 689405 are possible to fix
(see each entries, I proposed the patches and
regression test has passed).
So I return this bug to support, for the evaluation
of the proposed patch.
Comment 17 Ray Johnston 2007-08-31 11:35:53 UTC
What is required is a patch for the gs 8.60 code that fixes this vertical
writing issue, NOT the entire CJKV set of patches.

We have many customers relying on the functionality of the existing Ghostscript
method and the massive CJKV patch changes many things that do not need to be
changed, possibly introducing new bugs. The regression test currently only has
limited testing of Asian fonts, so it is not an authoritative test of actual
customer usage of Ghostscript.

Note that if there are other problems in gs 8.60 that are addressed by the CJKV
patch, these should be entered as new bugs.
Comment 18 leonardo 2008-05-21 07:42:07 UTC
Changing assignment due to no progress.
Comment 19 leonardo 2008-09-05 06:32:57 UTC
Bumping priority for re-testing after recent changes. Likely it is fixed.
Comment 20 Ray Johnston 2008-09-05 09:29:21 UTC
Reassign to Marcos for re-test as requested by Igor
Comment 21 Marcos H. Woehrmann 2008-09-15 14:48:11 UTC
Created attachment 4400 [details]
screenshot.png
Comment 22 Marcos H. Woehrmann 2008-09-15 14:49:25 UTC
As of r9806 the punctuation is still rotated compared to the text (see the attached screenshot.png).
Comment 23 Marcos H. Woehrmann 2008-09-15 15:33:57 UTC
The fonts can be found as part of the Common Open Printing System: http://lx1.avasys.jp/OpenPrintingProject/openprinting-jp-0.1.3.tar.gz
Comment 24 Marcos H. Woehrmann 2008-09-15 15:35:25 UTC
Created attachment 4401 [details]
cidfmap

This is a minimal cidfmap that can be used to read the file.  It assumes the
ipam.ttf and ipaq.ttf files are in the current directory.
Comment 25 Ray Johnston 2008-09-15 19:09:04 UTC
Thanks to mpsuzuki for reviewing the images of attachment of comment #21.

The 'brackets' are not rotated in the left hand (Ghostscript) image -- this is
at the top and bottom of the seventh column from the left.

Also I notice that column 2 has a '2' that has a circle around it as well as the
'1' in the sixth column. The circles are missing in the printer output.
Comment 26 mpsuzuki 2008-09-15 19:15:35 UTC
>Also I notice that column 2 has a '2' that has a circle around it
>as well as the '1' in the sixth column. The circles are missing in
>the printer output.

Oh, the line of the circle in the printer output is very very thin,
but there are the circles in printer output.
Comment 27 leonardo 2008-10-08 00:38:05 UTC
Created attachment 4477 [details]
patch.txt

A patch to HEAD for glyph orientation. It's a partial fix except glyph
positions.
Comment 28 leonardo 2008-10-08 00:39:40 UTC
Comment #16 doesn't look useful for the mainstream GS development, because it 
fixes Koji Otani's problems only, which are not included into the mainstream. 
Comment #27 may be neccessary to apply to make this statement true. Toshiya 
please explain if I'm missing something.
Comment 29 leonardo 2008-10-08 00:49:36 UTC
Patch to HEAD :
http://ghostscript.com/pipermail/gs-cvs/2008-October/008705.html
commits the Comment #27 patch. It's a partial fix except glyph
positions.
Comment 30 mpsuzuki 2008-10-08 01:53:43 UTC
Created attachment 4478 [details]
PDF including punctuations to be rotated

Now I don't have sufficient time to build SVN HEAD
and review your code. Please show the rasterization
result of attached PDF.
Comment 31 Marcos H. Woehrmann 2008-10-13 21:08:38 UTC
Created attachment 4502 [details]
test.png

The PDF file from Comment #30 converted to PNG file using gshead (r9149).
Comment 32 leonardo 2008-10-14 09:27:32 UTC
Regarding Comment #30, 31: the document doesn't embed a CID font, so its 
interpretation depends on installed fonts CID. Marcos, please attach cidfmap 
you used to run it. The raster is not useful without that information.
Comment 33 Marcos H. Woehrmann 2008-10-14 09:31:01 UTC
I used the cidfmap from Comment #24.
Comment 34 leonardo 2008-10-14 12:55:36 UTC
Regarding Commnent #31-33 :

After decompressinmg streams in PDF_including_punctuations_to_be_rotated.pdf I 
see both texts are printed with same font, which uses the *horizontal* writing 
mode. The texts are started with CID=634 and CID=7887 correspondingly.

With the cidfmap attached above I run this test :

/Ryumin-Light-Identity-H findfont
/FDepVector get 0 get
% dup { exch = =
% } forall
/CIDMap get
/CID 634 def
dup 0 get dup CID 2 mul get 256 mul exch CID 2 mul 1 add get add =
/CID 7887 def
dup 0 get dup CID 2 mul get 256 mul exch CID 2 mul 1 add get add =

It prints :
500
500

It means that in the supplied font both CIDs map to same glyph (unless we have 
a bug in the TT cmap decoder (written by mpsuzuki), but I don't think so).

Then I conclude that the test case is incorrect. The reason for the 
incorrectness is that the supplied Open Type font is not sufficient to emulate 
the CID font Ryumin-Light. BTW when I change Identity-H to Identity-V (2 
occurances) in PDF_including_punctuations_to_be_rotated.pdf , Ghostscript 
renders rotated glyphs, which are correct (and the text is pronted vertically, 
as Adobe does).

So I believe that Commnent #31-33 to be closed with RESOLVED INVALID. Please 
use a better font for the CID font emulation.
Comment 35 mpsuzuki 2008-10-14 16:53:19 UTC
I don't understand why you conclude the testcase in Comment #30
as incorrect.

It was generated by Adobe Acrobat 7. The utilization of CID for
vertical glyph in horizontal writing mode is found in Adobe
Technical Note #5078 (the official specification of Adobe-
Japan1-6). So such usage (using CID for vertical glyph in horizontal
writing mode) must be accepted, to provide the compatibility with
Adobe products. In fact, Adobe Reader displays both of horizontal
and vertical glyphs from the testcase in Comment #30.

However, if you want to restrict your scope to the vertical
glyph in vertical writing mode only and you feel the testcase
in Comment #30 is beyond of the scope, I will file Comment #30
as another bug. You can close it as "WON'T FIX", but it is not
"INVALID".
Comment 36 leonardo 2008-10-14 23:46:08 UTC
I thought more on comment #34 and I think it needs a correction (even without 
Commnet #35). 

This quote is wrong : "It means that in the supplied font both CIDs map to 
same glyph (unless we have a bug in the TT cmap decoder (written by 
mpsuzuki),...)"

Actually the CID font emulation first maps CIDs to Unicode and then Unicode to 
GIDs. We fail at the first step because Unicode uses same codes for vertical 
and horisontal glyphs. Thus the failure is not related to the TT cmap decoder, 
which runs in the other step.

Then I thought how we can work around the failure. Likely we need a list of 
CIDs pairs, which correspond to single glyphs in Unicode (or in another 
encoding used for CIDDecoding resource). I think the right way is to create a 
new resource category HVGlyphs and its instances will be dictionaries that 
define V mapping and H mapping. Both V and H mapping are dictionaries that map 
CIDs to CIDs. Actually they're reverses of each other. The resource names are 
various orderings (Japan1, CNS1, and so forth).

Such resource will be loaded when a CID font emulation loads a True Type font 
or "an Open Type font with True Type data" and associated with the font.

When the text decomposition happens, we get CID in the text enumerator 
structure. Here we can lookup the HVGlyphs resource and see if the CID belongs 
to another writing mode. If so, trigger the call to 
gs_type42_substitute_glyph_index_vertical with the "exclusive or" logic.

Rater all above looks working, I don't like 2 things : (1) the dependence on 
Postscript code and (2) it works for Open Type only (doesn't work for True 
Type). The (1) looks more or less acceptable because this problem should not 
happen with other languages. Or we'll need (and we can) to fix it later. As to 
(2), the True Type case needs more investigation and more effort. Likely it 
will need to choose a right subfont from a True Type Collection or merge 
several fonts of the collection.

Thanks to mpsuzuki for reference to tech note in Commant #35. It proves that 
we need to do this job (I was in doubt before getting it).
Comment 37 leonardo 2008-10-28 08:43:16 UTC
I'm still in doubts about the name for the new resource category. It must be 
good becuse it is global. HVGlyphs doesn't look good. The most meanful name is 
WModeDependentCIDs or SubstituteCIDsDependingOnWMode, but I don't like its 
length. Suggestions are welcome.

Maybe CIDSubstitution, assuming that the resource instance will explain itself 
what the substitution dependss on ?
Comment 38 leonardo 2008-11-09 11:11:04 UTC
2nd fatch to HEAD :
http://ghostscript.com/pipermail/gs-cvs/2008-November/008788.html
closes Comment 30-36.
Comment 39 leonardo 2008-11-09 12:53:16 UTC
3d patch to HEAD :

http://ghostscript.com/pipermail/gs-cvs/2008-November/008789.html
Comment 40 leonardo 2008-11-16 04:22:31 UTC
One more patch for glyph positions :

http://ghostscript.com/pipermail/gs-cvs/2008-November/008800.html

Now we've got 3 patches that close this bug all together. But we still want to 
do some additional improvememnts, so don't close the bug now.
Comment 41 leonardo 2008-11-16 06:49:39 UTC
One more patch to HEAD :

http://ghostscript.com/pipermail/gs-cvs/2008-November/008801.html
Comment 43 leonardo 2008-11-23 12:40:33 UTC
One more patch to head :

http://ghostscript.com/pipermail/gs-cvs/2008-November/008816.html
Comment 44 leonardo 2009-01-08 01:54:28 UTC
Reopening the bug because it is reproducible with a newer version of Japanese 
fonts http://ossipedia.ipa.go.jp/ipafont/IPAfont00203.php
Comment 45 leonardo 2009-01-08 02:01:10 UTC
Created attachment 4693 [details]
IPAfont00203.zip

A local copy of the fonts.
Comment 46 leonardo 2009-01-08 02:14:02 UTC
A fast checking shows it falls to unimplewmented case coverage_format == 2. If 
it is only problem, a fix shouldn't be difficult.
Comment 47 leonardo 2009-01-08 03:33:24 UTC
Created attachment 4694 [details]
patch4.txt

A patch is being tested.
Comment 48 leonardo 2009-01-08 04:51:26 UTC
One more patch to HEAD :
http://ghostscript.com/pipermail/gs-cvs/2009-January/008913.html