When converting the attached PDF file with Ghostscript to PPM certain characters are missing from the output. For example, the name 'JOSEPH' is 'OSEPH'. I've duplicated this problem with gs-8.00, gs-8.54, and gs-head and they all fail the same way. Acrobat 5.0 and Apple Preview open the file correctly. The command line I used is: gs -sDEVICE=ppmraw -sOutputFile=test.ppm missing_chars.pdf Note that this file contains confidential information, please delete the file when is no longer needed and do not add it to the regression test.
Created attachment 2638 [details] missing_chars.pdf
The problem in the PDF interpreter is that this file has an embedded TTF subset of the /Helvetica font which is entered into the FontDirectory as Helvetica (the subset is missing the '@', 'J', 'V' and 'X' among other glyphs). When a subsequent "Tf" comes along that references the /Helv font which is defined as a Type1 font (not a subset), we are picking up the subset that is installed instead of loading the complete Type 1 font. Thus the problem stems from putting subsets in the FontDirectory under the regular BaseFont name.
What is being done about this bug? When is the anticipated fix going to take place? Thanks....
Created attachment 2875 [details] PDF that contains ö (Latin small letter O with diaeresis) This PDF contains ö characters (Latin small letter "O" with diaeresis or "two dots" above the letter) that are missing (not substituted or anything just completely missing) when viewing or converting through Ghostscript (all recent releases tested). See hungarian.gif for example of Ghostscript output (second line in top box) and compare.gif for a side by side comparison of the PDF input versus the Ghostscript output.
Created attachment 2876 [details] Output missing ö character (second line in top box) This is the output from Ghostscript (converted to GIF with extra pages removed) demonstrating that the ö characters are missing from the second line of the top box. See compare.gif for a side-by-side comparison of the PDF input and resulting Ghostscript output.
Created attachment 2877 [details] input output comparision This is a side-by-side comparison of the PDF input and Ghostscript output with the missing characters indicated.
I'm working on this proglem. The problem is caused by PDF interpreter font cache and looking up the fonts in the cache by name. The fix will be ready soon.
Hello Alex. Will a fix come in the form of a patch or will it be incorporated into the next full released. Thank you. Mark Warbington
Created attachment 3078 [details] patch Undefine the font that may be defined in memory before attempting to resolve a font name into a font. This guarantees that the font will be resolved into an external resource. The patch causes no differences on the Comparefiles test. However, it prevents in-memory re-definitions of the font resources, which may be undesirable.
The font file from the comment #4 is incorrect. The page resource dictionary points directly to a font stream. The font resource is not referenced from anywhere, its font descriptor doesn't point to the font stream. Acrobat Reader 5 or lower display the file similar to Ghostscript. Acrobat Reader 8 recovers the intended appearance of the file. This problem is unrelated to the problem, demonstrated by the file from the attachment #1 [details] and fixed by the patch from the attachment #9 [details]. It would be great to cover yet another case of PDF abuse but the results cannot be guaranteed. The font resource contains important information about the encoding and widths of the characters, but there's no link to the font resource from any object that belongs to the page.
Please disregard the comment #10. I misunderstood the file structure.
Created attachment 3176 [details] experimental patch The font file from the comment #4 is correct, but it uses new glyph names that we don't yet have in our fonts. The same glyphs are available under different names. Ghostscript is not alone. Acrobat Reader 5 or lower display the file similar to Ghostscript, but Acrobat Reader 8 shows the file correctly. This patch tries to load the glyph using the backward-compatible name when the primary search fails. The patch is not ready for the production use. Probably, glyph aliases should be created when the font is loaded to avoid any problems in PDF generation. I'm posting the patch to code review.
Alex, .type1build is executed with pdfwrite, so I guess the patch will associate aliased glyphs with the original glyph names. Not sure though. Please test with pdfwrite. BTW, A better way would be to fix Encoding when writing a PDF, to make the result to be more portable and trick independent.
The comment #13 is partially incorrect. pdfwrite will copy fonts and encodings, so the result will have same problem as the input. Alex, please test for sure. I think it's acceptable for now since Adobe can handle such documenmts.
The bug that caused missing characters in the sample #1 has been fixed some time ago. This patch makes /?dblacute and /?hungarumlaut glyph names equivalent in Type 1 fonts. It adds a missing glyph when the font is loaded if another glyph is defined. See: http://ghostscript.com/pipermail/gs-cvs/2008-June/008374.html This fixes the file from the attachment #4 [details]. Regression testing shows no differences.
Patch included into gs_type1.ps which ,,doubles'' some chars is hard to be disabled. It breaks the output of pf2afm GS script (AFM has more glyphs than PFB). To solve this problem, /t1_glyph_equivalence should be global, writable array or some parameter (e.g. .add_equivalent_glyphs) may be added.
Export t1_glyph_equivalence table, which provides alternative glyph names. Modify pf2afm.ps to disable glyph aliasing and generate AFM files that match the font. The following patch has been committed as a rev. 9792. http://ghostscript.com/pipermail/gs-cvs/2009-June/009423.html Regression testing shows no differences.
Changing customer bugs that have been resolved more than a year ago to closed.
The content of attachment 2638 [details] has been deleted by Marcos H. Woehrmann <marcos.woehrmann@artifex.com> who provided the following reason: Customer requested the file be deleted when no longer needed. The token used to delete this attachment was generated at 2011-09-22 09:17:29 PDT.