Summary: | pdfwrite: a composite font with a Type 3 descendent and FMapType 2 | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Philip Belemezov <Philip.Belemezov> |
Component: | PDF Writer | Assignee: | leonardo <leonardo> |
Status: | NOTIFIED FIXED | ||
Severity: | normal | CC: | debajyoti.tripathy, htl10, jani-matti.hatinen, jss, marcos.woehrmann, sags5495, zeev-r |
Priority: | P3 | ||
Version: | 8.15 | ||
Hardware: | PC | ||
OS: | Linux | ||
Customer: | Word Size: | --- | |
Attachments: |
This is the GhostScript files gs crashes at.
"good" pdf from 8.15.3 shipped by redhat fc6 Pack of 2 suggested patchs (ZIP file). |
Description
Philip Belemezov
2006-04-08 05:31:08 UTC
Created attachment 2144 [details]
This is the GhostScript files gs crashes at.
This is the GhostScript I am trying to convert to PDF.
Well, I mean it's the PostScript file. Sorry. I am unable to reproduce the seg fault with either 8.15 or current svn head. (I am using WinXP, MSVC .net 2003.) However the PDF file which is being created has most of the text missing. cmd line: bin/gswin32c -sDEVICE=pdfwrite -sOutputFile=xx.pdf -c .setpdfwrite -f 688639.ps -c quit Changing the assignment for pdfwrite bugs. ps2pdf works just fine with 8.53 x86 linux, and running the exact same command with 8.53 as the intial bug report works correctly too. Time for an upgrade or investigate if there is any vendor-applied patch (unless the 8.15 was compiled from source), I think. Oh, could it be x86_64 specific? Hin-Tak Leung, Did you look at the pdf file produced? As I noted, I see most of the text missing. Do you see the same problem or is your PDF okay? On my Ubuntu (breezy, 32-bit) install using svn GS, I get the same behavior as comment #3 - no crash, but much text missing. I also get a few of these reports from valgrind: ==10691== at 0x81D91B4: process_composite_text (gdevpdtc.c:157) ==10691== at 0x81D9092: process_composite_text (gdevpdtc.c:112) Based on this, it's plausble that the problem is an uninitialized pte->text.space.s_char. The document uses a composite font with Type 3 descendent and FMapType 2. The related branch is not yet implemented in pdfwrite. Changing the bug title for a better reflection of the problem. The old title
is "gs crashes when trying to convert a ps to pdf". The unitialized pte-
>text.space.s_char also needs an attention.
*** Bug 688760 has been marked as a duplicate of this bug. *** Bumping priority - need to retest with recent fixes. Still not working - a text missed. We're sorry. Downgrade the priority for P3 for free user bugs. Created attachment 2703 [details]
"good" pdf from 8.15.3 shipped by redhat fc6
8.54 shows similiarly missing characters, but strangely enough
with ESP 8.15.3 on x86_64 (fc6) all the characters are there.
Attached for reference, in case somebody can analyse the file
and work out what ESP 8.15.3 is doing right.
Created attachment 2832 [details] Pack of 2 suggested patchs (ZIP file). The attached patches are ment to fix the following bugs, which I found to be duplicates of each other: Bug #688639 "pdfwrite: a composite font with a Type 3 descendent and FMapType 2" Bug #688760 "ps2pdf loses text in postscript figure when converted to pdf" Bug #688954 "Text disappears when converting some ps files with ps2pdf" Bug #689001 "Characters lost converting PS to PDF" Bug #689041 "Japanese Font Display Problem is ps2pdf" Bug #689105 "Invalid fonts error during converting from PS to PDF" List of sample files: attachment #2144 [details], attachment #2290 [details], attachment #2549 [details], attachment #2550 [details], attachment #2591 [details], attachment #2617 [details] (identical to the preceding one), attachment #2685 [details] (Type 0 font with multiple Type 0/1/3 descendents, see bug #689041 comment #12 on how to use it), attachment #2793 [details]. For testing I compared the output from unpached Ghostscript PS->PPM with the one from patched Ghostscript PS->PDF->PPM. (There are some slight differences from color and font conversions, but all the text is there.) FMapType 2 ------- I haven't found a reason for this particular value to make a difference, at least not for the attached samples. Regression from GS 8.15 ------- I have not checked with a "genuine" GS8.15. The PDFs in attachment #2703 [details] and attachment #2594 [details] do indeed display all the text. But that text is converted to bitmaps, and copying it produces much garbage (not because of the encoding, but because of metrics that make Reader think characters in different rows overlap). The current TRUNK tries to create Type 3 fonts containing outlines. It succedes with simple Type 3 fonts, and after the attached fix it will also succeed with Type 3 fonts that are descendants of Type 0 fonts. Bug and patch details ------ (A) Many glyphs skipped if Type 0 font with Type 3 descendant --- (patch: Bug688639-r7777-to-r7777A.diff.txt) Function gdevpdtc.c::process_composite_text() essentially transforms a single "show" operation that uses a Type 0 font into one or more "show" operations of substrings that use the Type 0 font's leaf fonts. Functions at lower levels (pdf_process_string[_aux]()) do properly detect glyphs that are not accumulated and process the string up to such a glyph, letting the text enum they receive pointing to the not-yet-accumulated glyph. Functions at higher levels do accumulate glyphs when process_composite_text() returns with gs_error_undefined or with gs_text_enum.index < gs_text_enum_t.text.size (meaning not all of the text has been processed), and continue the "show" operation after doing so. So, the mechanism that accumulates descendent Type 3 glyphs on an as-needed basis exists and almost works. The bug appears because process_composite_text() does not check whether pdf_process_string_aux() processed the whole string it received or only part of it. There are 3 cases: (i) If the 1st glyph in the substring is not accumulated, then pdf_process_string_aux() and then process_composite_text() return gs_error_undefind (-21) with *pte pointing to this 1st glyph. This triggers the glyps accumulation, and this glyph will be displayed. It is the only case in which glyphs get accumulated, so only a few glyphs (those at the beginning of substrings) are ever shown. (ii) If all chars are already accumulated, then the substring is shown normally. (But only a few chars are accumulated...) (iii) If the substring starts with some already-accumulated glyphs, but also contains one that is not accumulated, then: - pdf_process_string_aux() processes the prefix, and these chars are displayed; - it returns 0 for success; - process_composite_text() does not check that only part of the string was processed, and beheaves as if the whole substring got displayed; "prev" has already been advanced past the substring, so the not-yet-accumulated glyph and those after it are skipped and never accumulated. The fix: when pdf_process_string_aux() returns success, process_composite_text() checks whether the whole substring was processed or only a part of it. In the 2nd case it explicitely advances *pte past the chars that were effectively "consumed" (letting it to point to the to-be-acumulated glyph) and returns. The caller will take care of accumulating it and continuing displaying the rest of the string. Notes: - I preferred to advance *pte in a loop. "prev" is already past the whole substring, and I think it cannot be moved "backwards". out.index cannot be simply added to pte->index because (1) out.index refers to a Type 3 font, thus with 1 byte/glyph and (2) pte->index must be incremented by the number of bytes used to encode those out.index glyph in the Type 0 font, and this font uses multiple/variable number of bytes per glyph. - A comment just before gdevpdte.c::pdf_process_string() states that it "Doesn't use or set pte->{data,size,index}". This is not completely true: while pte->index is not used to index into the string, it is incremented. The patch initialises out.index to 0, so pdf_process_string_aux() returns the count of chars that it actually processed in out.index. (B) xyshow/etc: Wrong spacing if Type 0 font with Type 3 descendent --- (patch: Bug688639-r7777A-to-r7777AB.diff.txt) When gdevpdtc.c::process_composite_text() returns, pte->xy_index is incorrect. In gdevpdtc.c -r7777 line #142 "gs_text_enum_copy_dynamic(pte, (gs_text_enum_t *)&prev, true);": - "out.xy_index" has been correctly advanced past the substring (or the part of it that was actually processed); - "curr.xy_index" has been updated from out.xy_index; - "prev.xy_index" still corresponds to the beginning of the substring, and so will do "pte->xy_index"; "prev" will be updated from "curr" at the beginning of the next loop, if any, but this update does not touch "pte" until an additional substring is processed successfully; - When the function returns, "pte->xy_index" remains "one substring behind". If the "xyshow" operation is complete, the widths array won't be needed anymore, so this inconsistency won't matter. But if the return is caused by the need to accumulate a glyph, when the "xyshow" operation is continued it will reuse widths of the last successfull substring for the one that's restarted, so some glyphs end up with incorrect widths. Fix: Explicitely update pte->xy_index from out.xy_index. *** Bug 688954 has been marked as a duplicate of this bug. *** *** Bug 689001 has been marked as a duplicate of this bug. *** *** Bug 689041 has been marked as a duplicate of this bug. *** *** Bug 689105 has been marked as a duplicate of this bug. *** Patch to HEAD : http://ghostscript.com/pipermail/gs-cvs/2007-April/007395.html One more patch : http://ghostscript.com/pipermail/gs-cvs/2007-April/007397.html |