Summary: | Mapping simplified Chinese, punctuation missing issues | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | He Fan <heyangfan88> |
Component: | Font API | Assignee: | Chris Liddell (chrisl) <chris.liddell> |
Status: | CONFIRMED --- | ||
Severity: | enhancement | CC: | sphinx.pinastri |
Priority: | P4 | ||
Version: | 9.27 | ||
Hardware: | PC | ||
OS: | Windows 10 | ||
Customer: | Word Size: | --- | |
Attachments: |
The test.pdf uploaded before is wrong, and this is correct,im so sorry
This is the converted PNG image, you can see that the 5 18 in the middle of the · disappeared the simple file,font file and cidfmap the windows system font in cidfmap |
Description
He Fan
2019-08-27 03:39:56 UTC
Created attachment 18040 [details]
The test.pdf uploaded before is wrong, and this is correct,im so sorry
Created attachment 18041 [details]
This is the converted PNG image, you can see that the 5 18 in the middle of the · disappeared
*** Bug 701463 has been marked as a duplicate of this bug. *** (In reply to He Fan from comment #0) > When I used cidfmap to mapping some simplified Chinese fonts (Founder > fonts), I encountered some problems and a small number of punctuation marks > were lost(like 5·18 the ·is lost) > > When I remove these mappings, will use the default font and the punctuation > will work fine. The missing fonts are in fact CIDFonts, not Fonts. Your cidfmap does not supply CIDFonts to replace those fonts, it uses TrueType fonts (and TrueType Collections). I'm afraid that CIDFonts and TrueType fonts are not the same thing. Using TrueType fonts as substitutes for missing CIDFonts is a Ghostscript eature, but it is not guaranteed to be 100% reliable. Some information has to be created and doing so is, in part, guesswork. Punctuation marks are the most likely to suffer from missed mappings, especially when a vertical font is substituted with a horizontal font, or vice versa. > I tried other methods, such as using Microsoft Office Word to edit text and > punctuation in the same font, such as "·" and then converting to PDF, and > then use ghoscript command convert, it is very successful, without losing > any characters, the font is correct. Opening the file in an editing application and then saving it as PDF will, almost certainly, use completely different fonts to the missing ones in your original PDF file, and the PDF file produced will contain the actual fonts used. When the fonts are embedded in the PDF file (as fonts in general, and CIDFonts in particular, should be) then Ghostscript will use the fonts in the PDF file. Naturally this will work correctly, all the required information is present in the PDF file to render the correct glyphs. > Of course, I also used Adobe Acrobat DC to try to convert problematic PDF > to PNG, it correctly recognized and converted these fonts and punctuation > marks. When I open the PDF file here using Acrobat it substitutes every missing font (of which there are 11, oddly one CIDFont *is* embedded...) with Adobe-HeitiStd-Regular. So Acrobat is not 'correctly recognising and converting' anything. It is using a different substitute font. As you noted in comment #0, if you let Ghostscript use its own default font, instead of specifying a substitute in cidfmap, then Ghostscript also renders the punctuation marks. > Looking forward to your reply. Thank you very much! If you want correct, accurate, rendering of CIDFonts you must create the PDF file with the CIDFonts embedded (note that the PDF specification says this is a requirement). If you don't do this, then a substitute font will be used, either the Ghostscript fallback or a substittue of your own creation. If teh substitute is not the exact same font as was used to create the PDF document then the rendered output *is* wrong, because the CIDFont is not the one intended by the author and the appearance of the font will differ from that which was intended. If you use a TrueType font as a substitute for a CIDFont, then I'm afraid that, yes, it is possible that Ghostscript may be unable to 100% correctly map all the CIDs to matching glyph descriptions in the TrueType font, and errors may occur. I'll leave this open until the developer with particular font expertise has a look, but my expectation is that there is nothing further that we can do about this. You need to suply the correct fonts as substittues in order to get correct rendering. I'm happy to investigate further, when I have the time, but to do so, I will need a much simpler example file, the cidfmap and possibly the font file(s) you reference in the cidfmap. There is, however, a strong likelihood that the problem is simply that the glyph ordering of the font you've substituted doesn't match the glyph ordering of the original TTFs used to generate the (15!!) non-enbedded CIDFont in the PDF. Should that turn out to be the case, there is pretty much nothing we can do about it. Created attachment 18047 [details]
the simple file,font file and cidfmap
(In reply to Chris Liddell (chrisl) from comment #5) > I'm happy to investigate further, when I have the time, but to do so, I will > need a much simpler example file, the cidfmap and possibly the font file(s) > you reference in the cidfmap. > > There is, however, a strong likelihood that the problem is simply that the > glyph ordering of the font you've substituted doesn't match the glyph > ordering of the original TTFs used to generate the (15!!) non-enbedded > CIDFont in the PDF. Should that turn out to be the case, there is pretty > much nothing we can do about it. Thank you very much for your reply! Because I don't have permission to operate the PDF generation application, I just applied to the leader, but I was sorry to have been rejected. Although I am very upset, I still want to solve this problem. So I deleted the unwanted parts of the previous PDF file. If this is inconvenient for you, please let me know and I will apply again to the leader for a clean file. Thanks again! Created attachment 18049 [details]
the windows system font in cidfmap
As we said above, the problem is that the glyph ordering in the font you are using doesn't match the glyph ordering expected by the CMap embedded in the PDF. Other TTF fonts, whose ordering does (more closely, at least) match that expected, *seem* to work better (disclaimer: I don't read any Chinese). I tried wqy-microhei.ttc (available packaged on Debian derived Linux distros), and the output seemed okay. There may be something more we can do to improve things, so I'll keep this open as an enhancement. But as the spec explicitly says to embed CIDFonts (specifically in order to avoid *exactly* this kind of inconsistency), I don't think this is a "bug", as such. (In reply to Chris Liddell (chrisl) from comment #9) > As we said above, the problem is that the glyph ordering in the font you are > using doesn't match the glyph ordering expected by the CMap embedded in the > PDF. Other TTF fonts, whose ordering does (more closely, at least) match > that expected, *seem* to work better (disclaimer: I don't read any Chinese). > > I tried wqy-microhei.ttc (available packaged on Debian derived Linux > distros), and the output seemed okay. > > There may be something more we can do to improve things, so I'll keep this > open as an enhancement. But as the spec explicitly says to embed CIDFonts > (specifically in order to avoid *exactly* this kind of inconsistency), I > don't think this is a "bug", as such. ok,I see what you mean.. In fact, I tried to find the conversion problem a few days ago. The final guess is that some of the grammar in the PDF is not standardized or there is a problem with these font files. But after all, I am not professional enough and I have to ask your help to determine the problem. Finally, I want to say to you in Chinese: 谢谢(thanks)! I confirm the current master branch behaves the same way as v.9.27, which is described in this bug report. |