Bug 701464 - Mapping simplified Chinese, punctuation missing issues
Summary: Mapping simplified Chinese, punctuation missing issues
Status: CONFIRMED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Font API (show other bugs)
Version: 9.27
Hardware: PC Windows 10
: P4 enhancement
Assignee: Chris Liddell (chrisl)
URL:
Keywords:
: 701463 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-08-27 03:39 UTC by He Fan
Modified: 2021-01-02 23:17 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments
The test.pdf uploaded before is wrong, and this is correct,im so sorry (861.48 KB, application/pdf)
2019-08-27 03:43 UTC, He Fan
Details
This is the converted PNG image, you can see that the 5 18 in the middle of the · disappeared (544.92 KB, image/png)
2019-08-27 03:44 UTC, He Fan
Details
the simple file,font file and cidfmap (6.11 MB, application/x-zip-compressed)
2019-08-27 10:19 UTC, He Fan
Details
the windows system font in cidfmap (44.84 MB, application/x-zip-compressed)
2019-08-27 10:22 UTC, He Fan
Details

Note You need to log in before you can comment on or make changes to this bug.
Description He Fan 2019-08-27 03:39:56 UTC
Hi friends,

 I'm Chinese. Well, my English isn't very good。

 I'm using ghostscript for PDF to image(PNG).

 When I used cidfmap to mapping some simplified Chinese fonts (Founder fonts), I encountered some problems and a small number of punctuation marks were lost(like 5·18 the ·is lost)

 When I remove these mappings, will use the default font and the punctuation will work fine.

 This was strange, because the PDF documents were not generated locally by me, but by another application. 

 I tried other methods, such as using Microsoft Office Word to edit text and punctuation in the same font, such as "·" and then converting to PDF, and then use ghoscript command convert, it is very successful, without losing any characters, the font is correct.

 Of course, I also used Adobe Acrobat DC to try to convert problematic PDF to PNG, it correctly recognized and converted these fonts and punctuation marks.

 


  Here is the command I used, along with the configuration of the output log and cidfmap maps, and of course I uploaded a problematic PDF file.I will also send you the output PNG if needed.


 command:
    gswin64c -dBATCH -dFAPIDEBUG -dNOPAUSE  -sDEVICE=pngalpha -sOutputFile=f:/test%03d.png f:/test.pdf
 

 The FAPI LOG:


C:\Users\vic>gswin64c -dBATCH -dFAPIDEBUG -dNOPAUSE  -sDEVICE=pngalpha -sOutputFile=f:/test%03d.png f:/test.pdf
GPL Ghostscript 9.27 (2019-04-04)
Copyright (C) 2018 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 1.
Page 1
Loading a TT font from C:/Program Files/gs/gs9.27/Resource/CIDFont/FZLTZHUNHK_HBBY.TTF to emulate a CID font FZLTZHUNHK_HBBY--GBK1-0 ... Done.

FAPIhook FZLTZHUNHK_HBBY--GBK1-0
Trying to render the font Font FZLTZHUNHK_HBBY--GBK1-0 with FAPI...
Font FZLTZHUNHK_HBBY--GBK1-0 is being rendered with FAPI=FreeType

FAPIhook FZLTZHUNHK_HBBY--GBK1-0
Font FZLTZHUNHK_HBBY--GBK1-0 is mapped to FAPI=FreeType
Loading a TT font from C:/Program Files/gs/gs9.27/Resource/CIDFont/FZLTZHK_HBBY.TTF to emulate a CID font FZLTZHK_HBBY--GBK1-0 ... Done.

FAPIhook FZLTZHK_HBBY--GBK1-0
Trying to render the font Font FZLTZHK_HBBY--GBK1-0 with FAPI...
Font FZLTZHK_HBBY--GBK1-0 is being rendered with FAPI=FreeType

FAPIhook FZLTZHK_HBBY--GBK1-0
Font FZLTZHK_HBBY--GBK1-0 is mapped to FAPI=FreeType
Loading a TT font from C:/Program Files/gs/gs9.27/Resource/CIDFont/FZLTZCHK_HBBY_new.TTF to emulate a CID font FZLTZCHK_HBBY--GBK1-0 ... Done.

FAPIhook FZLTZCHK_HBBY--GBK1-0
Trying to render the font Font FZLTZCHK_HBBY--GBK1-0 ( aliased from FZLanTingHei_HBBY-B-GBK ) with FAPI...
Font FZLTZCHK_HBBY--GBK1-0 ( aliased from FZLanTingHei_HBBY-B-GBK ) is being rendered with FAPI=FreeType

FAPIhook FZLTZCHK_HBBY--GBK1-0
Font FZLTZCHK_HBBY--GBK1-0 ( aliased from FZLanTingHei_HBBY-B-GBK ) is mapped to FAPI=FreeType
Loading a TT font from C:/Program Files/gs/gs9.27/Resource/CIDFont/FZLTXIHK_HBBY.TTF to emulate a CID font FZLTXIHK_HBBY--GBK1-0 ... Done.

FAPIhook FZLTXIHK_HBBY--GBK1-0
Trying to render the font Font FZLTXIHK_HBBY--GBK1-0 with FAPI...
Font FZLTXIHK_HBBY--GBK1-0 is being rendered with FAPI=FreeType

FAPIhook FZLTXIHK_HBBY--GBK1-0
Font FZLTXIHK_HBBY--GBK1-0 is mapped to FAPI=FreeType
Loading a TT font from C:/Program Files/gs/gs9.27/Resource/CIDFont/FZHTK.TTF to emulate a CID font FZHTK--GBK1-0 ... Done.

FAPIhook FZHTK--GBK1-0
Trying to render the font Font FZHTK--GBK1-0 with FAPI...
Font FZHTK--GBK1-0 is being rendered with FAPI=FreeType

FAPIhook FZHTK--GBK1-0
Font FZHTK--GBK1-0 is mapped to FAPI=FreeType
Loading a TT font from C:/Program Files/gs/gs9.27/Resource/CIDFont/FZBYSK_HBBY.TTF to emulate a CID font FZBYSK_HBBY--GBK1-0 ... Done.

FAPIhook FZBYSK_HBBY--GBK1-0
Trying to render the font Font FZBYSK_HBBY--GBK1-0 with FAPI...
Font FZBYSK_HBBY--GBK1-0 is being rendered with FAPI=FreeType

FAPIhook FZBYSK_HBBY--GBK1-0
Font FZBYSK_HBBY--GBK1-0 is mapped to FAPI=FreeType
Loading a TT font from C:/Program Files/gs/gs9.27/Resource/CIDFont/FZKTK.TTF to emulate a CID font FZKTK--GBK1-0 ... Done.

FAPIhook FZKTK--GBK1-0
Trying to render the font Font FZKTK--GBK1-0 with FAPI...
Font FZKTK--GBK1-0 is being rendered with FAPI=FreeType

FAPIhook FZKTK--GBK1-0
Font FZKTK--GBK1-0 is mapped to FAPI=FreeType

FAPIhook KOWESO+NEU-BZ-Regular
Trying to render the font Font KOWESO+NEU-BZ-Regular with FAPI...
Font KOWESO+NEU-BZ-Regular is being rendered with FAPI=FreeType

FAPIhook KOWESO+NEU-BZ-Regular
Font KOWESO+NEU-BZ-Regular is mapped to FAPI=FreeType

FAPIhook --nostringval--
Font --nostringval-- ( aliased from KOWESO+NEU-BZ-Regular ) is mapped to FAPI=FreeType
Loading a TT font from C:/Program Files/gs/gs9.27/Resource/CIDFont/FZCYSK_HBBY.TTF to emulate a CID font FZCYSK_HBBY--GBK1-0 ... Done.

FAPIhook FZCYSK_HBBY--GBK1-0
Trying to render the font Font FZCYSK_HBBY--GBK1-0 with FAPI...
Font FZCYSK_HBBY--GBK1-0 is being rendered with FAPI=FreeType

FAPIhook FZCYSK_HBBY--GBK1-0
Font FZCYSK_HBBY--GBK1-0 is mapped to FAPI=FreeType

FAPIhook FZLTZHUNHK_HBBY--GBK1-0
Font FZLTZHUNHK_HBBY--GBK1-0 is mapped to FAPI=FreeType
Loading a TT font from C:/Program Files/gs/gs9.27/Resource/CIDFont/FZDYSK_HBBY.TTF to emulate a CID font FZDYSK_HBBY--GBK1-0 ... Done.

FAPIhook FZDYSK_HBBY--GBK1-0
Trying to render the font Font FZDYSK_HBBY--GBK1-0 with FAPI...
Font FZDYSK_HBBY--GBK1-0 is being rendered with FAPI=FreeType

FAPIhook FZDYSK_HBBY--GBK1-0
Font FZDYSK_HBBY--GBK1-0 is mapped to FAPI=FreeType

FAPIhook FZCYSK_HBBY--GBK1-0
Font FZCYSK_HBBY--GBK1-0 is mapped to FAPI=FreeType
Loading a TT font from C:/Program Files/gs/gs9.27/Resource/CIDFont/FZLTDHK_HBBY.TTF to emulate a CID font FZLTDHK_HBBY--GBK1-0 ... Done.

FAPIhook FZLTDHK_HBBY--GBK1-0
Trying to render the font Font FZLTDHK_HBBY--GBK1-0 with FAPI...
Font FZLTDHK_HBBY--GBK1-0 is being rendered with FAPI=FreeType

FAPIhook FZLTDHK_HBBY--GBK1-0
Font FZLTDHK_HBBY--GBK1-0 is mapped to FAPI=FreeType
Loading a TT font from C:/Program Files/gs/gs9.27/Resource/CIDFont/FZLTCHK_HBBY_new.TTF to emulate a CID font FZLTCHK_HBBY--GBK1-0 ... Done.

FAPIhook FZLTCHK_HBBY--GBK1-0
Trying to render the font Font FZLTCHK_HBBY--GBK1-0 ( aliased from FZLanTingHei_HBBY-B-GBK ) with FAPI...
Font FZLTCHK_HBBY--GBK1-0 ( aliased from FZLanTingHei_HBBY-B-GBK ) is being rendered with FAPI=FreeType

FAPIhook FZLTCHK_HBBY--GBK1-0
Font FZLTCHK_HBBY--GBK1-0 ( aliased from FZLanTingHei_HBBY-B-GBK ) is mapped to FAPI=FreeType

FAPIhook YBQOUS+FZCYSK_HBBY--GBK1-0
Trying to render the font Font YBQOUS+FZCYSK_HBBY--GBK1-0 with FAPI...
Font YBQOUS+FZCYSK_HBBY--GBK1-0 is being rendered with FAPI=FreeType

FAPIhook YBQOUS+FZCYSK_HBBY--GBK1-0
Font YBQOUS+FZCYSK_HBBY--GBK1-0 is mapped to FAPI=FreeType

FAPIhook --nostringval--
Font --nostringval-- ( aliased from YBQOUS+FZCYSK_HBBY--GBK1-0 ) is mapped to FAPI=FreeType



  The cidfmap content:
%!
% cidfmap generated automatically by mkcidfm.ps from fonts found in
%   c:/windows/fonts

% Substitutions
/SimSun << /CSI [(GB1) 2] /SubfontID 0 /FileType /TrueType /Path (c:/windows/fonts/simsun.ttc) >> ;
/FangSong << /CSI [(GB1) 2] /SubfontID 0 /FileType /TrueType /Path (c:/windows/fonts/simfang.ttf) >> ;
/MalgunGothicRegular << /CSI [(Korea1) 3] /SubfontID 0 /FileType /TrueType /Path (c:/windows/fonts/malgun.ttf) >> ;
/MS-UI-Gothic << /CSI [(Japan1) 3] /SubfontID 2 /FileType /TrueType /Path (c:/windows/fonts/msgothic.ttc) >> ;
/MS-Gothic << /CSI [(Japan1) 3] /SubfontID 0 /FileType /TrueType /Path (c:/windows/fonts/msgothic.ttc) >> ;
/MalgunGothicBold << /CSI [(Korea1) 3] /SubfontID 0 /FileType /TrueType /Path (c:/windows/fonts/malgunbd.ttf) >> ;
/MS-PGothic << /CSI [(Japan1) 3] /SubfontID 1 /FileType /TrueType /Path (c:/windows/fonts/msgothic.ttc) >> ;
/KaiTi << /CSI [(GB1) 2] /SubfontID 0 /FileType /TrueType /Path (c:/windows/fonts/simkai.ttf) >> ;
/NSimSun << /CSI [(GB1) 2] /SubfontID 1 /FileType /TrueType /Path (c:/windows/fonts/simsun.ttc) >> ;
/SimHei << /CSI [(GB1) 2] /SubfontID 0 /FileType /TrueType /Path (c:/windows/fonts/simhei.ttf) >> ;


/FZLTZHK_HBBY--GBK1-0  << /FileType /TrueType /Path (C:/Program Files/gs/gs9.27/Resource/CIDFont/FZLTZHK_HBBY.TTF)  /SubfontID 0 /CSI [(GB1) 2] >> ;
/FZLTZCHK_HBBY--GBK1-0  << /FileType /TrueType /Path (C:/Program Files/gs/gs9.27/Resource/CIDFont/FZLTZCHK_HBBY_new.TTF)  /SubfontID 0 /CSI [(GB1) 2] >> ;
/FZLTXIHK_HBBY--GBK1-0  << /FileType /TrueType /Path (C:/Program Files/gs/gs9.27/Resource/CIDFont/FZLTXIHK_HBBY.TTF)  /SubfontID 0 /CSI [(GB1) 2] >> ;
/FZHTK--GBK1-0  << /FileType /TrueType /Path (C:/Program Files/gs/gs9.27/Resource/CIDFont/FZHTK.TTF)  /SubfontID 0 /CSI [(GB1) 2] >> ;
/FZBYSK_HBBY--GBK1-0  << /FileType /TrueType /Path (C:/Program Files/gs/gs9.27/Resource/CIDFont/FZBYSK_HBBY.TTF)  /SubfontID 0 /CSI [(GB1) 2] >> ;
/FZKTK--GBK1-0  << /FileType /TrueType /Path (C:/Program Files/gs/gs9.27/Resource/CIDFont/FZKTK.TTF)  /SubfontID 0 /CSI [(GB1) 2] >> ;
/FZCYSK_HBBY--GBK1-0  << /FileType /TrueType /Path (C:/Program Files/gs/gs9.27/Resource/CIDFont/FZCYSK_HBBY.TTF)  /SubfontID 0  /CSI [ (GB1) 2]  >> 
/FZDYSK_HBBY--GBK1-0  << /FileType /TrueType /Path (C:/Program Files/gs/gs9.27/Resource/CIDFont/FZDYSK_HBBY.TTF)  /SubfontID 0 /CSI [(GB1) 2] >> ;
/FZLTCHK_HBBY--GBK1-0  << /FileType /TrueType /Path (C:/Program Files/gs/gs9.27/Resource/CIDFont/FZLTCHK_HBBY_new.TTF)  /SubfontID 0 /CSI [(GB1) 2] >> ;
/FZLTZHUNHK_HBBY--GBK1-0  << /FileType /TrueType /Path (C:/Program Files/gs/gs9.27/Resource/CIDFont/FZLTZHUNHK_HBBY.TTF)  /SubfontID 0 /CSI [(GB1) 2] >> ;
/FZLTDHK_HBBY--GBK1-0  << /FileType /TrueType /Path (C:/Program Files/gs/gs9.27/Resource/CIDFont/FZLTDHK_HBBY.TTF)  /SubfontID 0 /CSI [(GB1) 2] >> ;

% Aliases
/AdobeHeitiStd-Regular /SimHei ;
/STHeiti-Regular /SimHei ;
/STSong-Light /SimSun ;
/STKaiti-Regular /KaiTi ;
/STFangsong-Light /FangSong ;
/AdobeMyungjoStd-Medium /MalgunGothicRegular ;
/HYSMyeongJo-Medium /MalgunGothicRegular ;
/GothicBBB-Medium /MS-Gothic ;
/HYRGoThic-Medium /MalgunGothicRegular ;
/AdobeSongStd-Light /SimSun ;
/HYGoThic-Medium /MalgunGothicRegular ;
/HeiseiKakuGo-W5 /MS-Gothic ;







Looking forward to your reply. Thank you very much!
Comment 1 He Fan 2019-08-27 03:43:46 UTC
Created attachment 18040 [details]
The test.pdf uploaded before is wrong, and this is correct,im so sorry
Comment 2 He Fan 2019-08-27 03:44:37 UTC
Created attachment 18041 [details]
This is the converted PNG image, you can see that the 5 18 in the middle of the · disappeared
Comment 3 Ken Sharp 2019-08-27 06:56:33 UTC
*** Bug 701463 has been marked as a duplicate of this bug. ***
Comment 4 Ken Sharp 2019-08-27 07:19:16 UTC
(In reply to He Fan from comment #0)

>  When I used cidfmap to mapping some simplified Chinese fonts (Founder
> fonts), I encountered some problems and a small number of punctuation marks
> were lost(like 5·18 the ·is lost)
> 
>  When I remove these mappings, will use the default font and the punctuation
> will work fine.

The missing fonts are in fact CIDFonts, not Fonts. Your cidfmap does not supply CIDFonts to replace those fonts, it uses TrueType fonts (and TrueType Collections). I'm afraid that CIDFonts and TrueType fonts are not the same thing.

Using TrueType fonts as substitutes for missing CIDFonts is a Ghostscript eature, but it is not guaranteed to be 100% reliable. Some information has to be created and doing so is, in part, guesswork.

Punctuation marks are the most likely to suffer from missed mappings, especially when a vertical font is substituted with a horizontal font, or vice versa.


>  I tried other methods, such as using Microsoft Office Word to edit text and
> punctuation in the same font, such as "·" and then converting to PDF, and
> then use ghoscript command convert, it is very successful, without losing
> any characters, the font is correct.

Opening the file in an editing application and then saving it as PDF will, almost certainly, use completely different fonts to the missing ones in your original PDF file, and the PDF file produced will contain the actual fonts used.

When the fonts are embedded in the PDF file (as fonts in general, and CIDFonts in particular, should be) then Ghostscript will use the fonts in the PDF file. Naturally this will work correctly, all the required information is present in the PDF file to render the correct glyphs.

 
>  Of course, I also used Adobe Acrobat DC to try to convert problematic PDF
> to PNG, it correctly recognized and converted these fonts and punctuation
> marks.

When I open the PDF file here using Acrobat it substitutes every missing font (of which there are 11, oddly one CIDFont *is* embedded...) with Adobe-HeitiStd-Regular.

So Acrobat is not 'correctly recognising and converting' anything. It is using a different substitute font. As you noted in comment #0, if you let Ghostscript use its own default font, instead of specifying a substitute in cidfmap, then Ghostscript also renders the punctuation marks.


> Looking forward to your reply. Thank you very much!

If you want correct, accurate, rendering of CIDFonts you must create the PDF file with the CIDFonts embedded (note that the PDF specification says this is a requirement).

If you don't do this, then a substitute font will be used, either the Ghostscript fallback or a substittue of your own creation. If teh substitute is not the exact same font as was used to create the PDF document then the rendered output *is* wrong, because the CIDFont is not the one intended by the author and the appearance of the font will differ from that which was intended.

If you use a TrueType font as a substitute for a CIDFont, then I'm afraid that, yes, it is possible that Ghostscript may be unable to 100% correctly map all the CIDs to matching glyph descriptions in the TrueType font, and errors may occur.

I'll leave this open until the developer with particular font expertise has a look, but my expectation is that there is nothing further that we can do about this. You need to suply the correct fonts as substittues in order to get correct rendering.
Comment 5 Chris Liddell (chrisl) 2019-08-27 07:32:00 UTC
I'm happy to investigate further, when I have the time, but to do so, I will need a much simpler example file, the cidfmap and possibly the font file(s) you reference in the cidfmap.

There is, however, a strong likelihood that the problem is simply that the glyph ordering of the font you've substituted doesn't match the glyph ordering of the original TTFs used to generate the (15!!) non-enbedded CIDFont in the PDF. Should that turn out to be the case, there is pretty much nothing we can do about it.
Comment 6 He Fan 2019-08-27 10:19:38 UTC
Created attachment 18047 [details]
the simple file,font file and cidfmap
Comment 7 He Fan 2019-08-27 10:22:35 UTC
(In reply to Chris Liddell (chrisl) from comment #5)
> I'm happy to investigate further, when I have the time, but to do so, I will
> need a much simpler example file, the cidfmap and possibly the font file(s)
> you reference in the cidfmap.
> 
> There is, however, a strong likelihood that the problem is simply that the
> glyph ordering of the font you've substituted doesn't match the glyph
> ordering of the original TTFs used to generate the (15!!) non-enbedded
> CIDFont in the PDF. Should that turn out to be the case, there is pretty
> much nothing we can do about it.

Thank you very much for your reply!
Because I don't have permission to operate the PDF generation application, I just applied to the leader, but I was sorry to have been rejected.

 Although I am very upset, I still want to solve this problem.

 So I deleted the unwanted parts of the previous PDF file. If this is inconvenient for you, please let me know and I will apply again to the leader for a clean file.

Thanks again!
Comment 8 He Fan 2019-08-27 10:22:54 UTC
Created attachment 18049 [details]
the windows system font in cidfmap
Comment 9 Chris Liddell (chrisl) 2019-08-30 08:58:01 UTC
As we said above, the problem is that the glyph ordering in the font you are using doesn't match the glyph ordering expected by the CMap embedded in the PDF. Other TTF fonts, whose ordering does (more closely, at least) match that expected, *seem* to work better (disclaimer: I don't read any Chinese).

I tried wqy-microhei.ttc (available packaged on Debian derived Linux distros), and the output seemed okay.

There may be something more we can do to improve things, so I'll keep this open as an enhancement. But as the spec explicitly says to embed CIDFonts (specifically in order to avoid *exactly* this kind of inconsistency), I don't think this is a "bug", as such.
Comment 10 He Fan 2019-08-30 09:38:41 UTC
(In reply to Chris Liddell (chrisl) from comment #9)
> As we said above, the problem is that the glyph ordering in the font you are
> using doesn't match the glyph ordering expected by the CMap embedded in the
> PDF. Other TTF fonts, whose ordering does (more closely, at least) match
> that expected, *seem* to work better (disclaimer: I don't read any Chinese).
> 
> I tried wqy-microhei.ttc (available packaged on Debian derived Linux
> distros), and the output seemed okay.
> 
> There may be something more we can do to improve things, so I'll keep this
> open as an enhancement. But as the spec explicitly says to embed CIDFonts
> (specifically in order to avoid *exactly* this kind of inconsistency), I
> don't think this is a "bug", as such.

ok,I see what you mean..

In fact, I tried to find the conversion problem a few days ago. 

The final guess is that some of the grammar in the PDF is not standardized or there is a problem with these font files.

But after all, I am not professional enough and I have to ask your help to determine the problem.

Finally, I want to say to you in Chinese: 谢谢(thanks)!
Comment 11 Peter Cherepanov 2021-01-02 23:17:36 UTC
I confirm the current master branch behaves the same way as v.9.27, which is described in this bug report.