Summary: | Characters missing reading PDF file | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Marcos H. Woehrmann <marcos.woehrmann> |
Component: | PDF Interpreter | Assignee: | Alex Cherepanov <alex> |
Status: | NOTIFIED FIXED | ||
Severity: | normal | CC: | markw |
Priority: | P2 | ||
Version: | 8.56 | ||
Hardware: | All | ||
OS: | All | ||
Customer: | 580 | Word Size: | --- |
Attachments: |
PDF that contains ö (Latin small letter O with diaeresis)
Output missing ö character (second line in top box) input output comparision patch experimental patch |
Description
Marcos H. Woehrmann
2006-11-28 09:42:02 UTC
Created attachment 2638 [details]
missing_chars.pdf
The problem in the PDF interpreter is that this file has an embedded TTF subset of the /Helvetica font which is entered into the FontDirectory as Helvetica (the subset is missing the '@', 'J', 'V' and 'X' among other glyphs). When a subsequent "Tf" comes along that references the /Helv font which is defined as a Type1 font (not a subset), we are picking up the subset that is installed instead of loading the complete Type 1 font. Thus the problem stems from putting subsets in the FontDirectory under the regular BaseFont name. What is being done about this bug? When is the anticipated fix going to take place? Thanks.... Created attachment 2875 [details]
PDF that contains ö (Latin small letter O with diaeresis)
This PDF contains ö characters (Latin small letter "O" with diaeresis or "two
dots" above the letter) that are missing (not substituted or anything just
completely missing) when viewing or converting through Ghostscript (all recent
releases tested). See hungarian.gif for example of Ghostscript output (second
line in top box) and compare.gif for a side by side comparison of the PDF input
versus the Ghostscript output.
Created attachment 2876 [details]
Output missing ö character (second line in top box)
This is the output from Ghostscript (converted to GIF with extra pages removed)
demonstrating that the ö characters are missing from the second line of the top
box. See compare.gif for a side-by-side comparison of the PDF input and
resulting Ghostscript output.
Created attachment 2877 [details]
input output comparision
This is a side-by-side comparison of the PDF input and Ghostscript output with
the missing characters indicated.
I'm working on this proglem. The problem is caused by PDF interpreter font cache and looking up the fonts in the cache by name. The fix will be ready soon. Hello Alex. Will a fix come in the form of a patch or will it be incorporated into the next full released. Thank you. Mark Warbington Created attachment 3078 [details]
patch
Undefine the font that may be defined in memory before attempting to
resolve a font name into a font. This guarantees that the font will be resolved
into an external resource.
The patch causes no differences on the Comparefiles test.
However, it prevents in-memory re-definitions of the font
resources, which may be undesirable.
The font file from the comment #4 is incorrect. The page resource dictionary points directly to a font stream. The font resource is not referenced from anywhere, its font descriptor doesn't point to the font stream. Acrobat Reader 5 or lower display the file similar to Ghostscript. Acrobat Reader 8 recovers the intended appearance of the file. This problem is unrelated to the problem, demonstrated by the file from the attachment #1 [details] and fixed by the patch from the attachment #9 [details]. It would be great to cover yet another case of PDF abuse but the results cannot be guaranteed. The font resource contains important information about the encoding and widths of the characters, but there's no link to the font resource from any object that belongs to the page. Please disregard the comment #10. I misunderstood the file structure. Created attachment 3176 [details] experimental patch The font file from the comment #4 is correct, but it uses new glyph names that we don't yet have in our fonts. The same glyphs are available under different names. Ghostscript is not alone. Acrobat Reader 5 or lower display the file similar to Ghostscript, but Acrobat Reader 8 shows the file correctly. This patch tries to load the glyph using the backward-compatible name when the primary search fails. The patch is not ready for the production use. Probably, glyph aliases should be created when the font is loaded to avoid any problems in PDF generation. I'm posting the patch to code review. Alex, .type1build is executed with pdfwrite, so I guess the patch will associate aliased glyphs with the original glyph names. Not sure though. Please test with pdfwrite. BTW, A better way would be to fix Encoding when writing a PDF, to make the result to be more portable and trick independent. The comment #13 is partially incorrect. pdfwrite will copy fonts and encodings, so the result will have same problem as the input. Alex, please test for sure. I think it's acceptable for now since Adobe can handle such documenmts. The bug that caused missing characters in the sample #1 has been fixed some time ago. This patch makes /?dblacute and /?hungarumlaut glyph names equivalent in Type 1 fonts. It adds a missing glyph when the font is loaded if another glyph is defined. See: http://ghostscript.com/pipermail/gs-cvs/2008-June/008374.html This fixes the file from the attachment #4 [details]. Regression testing shows no differences. Patch included into gs_type1.ps which ,,doubles'' some chars is hard to be disabled. It breaks the output of pf2afm GS script (AFM has more glyphs than PFB). To solve this problem, /t1_glyph_equivalence should be global, writable array or some parameter (e.g. .add_equivalent_glyphs) may be added. Export t1_glyph_equivalence table, which provides alternative glyph names. Modify pf2afm.ps to disable glyph aliasing and generate AFM files that match the font. The following patch has been committed as a rev. 9792. http://ghostscript.com/pipermail/gs-cvs/2009-June/009423.html Regression testing shows no differences. Changing customer bugs that have been resolved more than a year ago to closed. The content of attachment 2638 [details] has been deleted by Marcos H. Woehrmann <marcos.woehrmann@artifex.com> who provided the following reason: Customer requested the file be deleted when no longer needed. The token used to delete this attachment was generated at 2011-09-22 09:17:29 PDT. |