Bug 688639

Summary: pdfwrite: a composite font with a Type 3 descendent and FMapType 2
Product: Ghostscript Reporter: Philip Belemezov <Philip.Belemezov>
Component: PDF WriterAssignee: leonardo <leonardo>
Status: NOTIFIED FIXED    
Severity: normal CC: debajyoti.tripathy, htl10, jani-matti.hatinen, jss, marcos.woehrmann, sags5495, zeev-r
Priority: P3    
Version: 8.15   
Hardware: PC   
OS: Linux   
Customer: Word Size: ---
Attachments: This is the GhostScript files gs crashes at.
"good" pdf from 8.15.3 shipped by redhat fc6
Pack of 2 suggested patchs (ZIP file).

Description Philip Belemezov 2006-04-08 05:31:08 UTC
Hello!
A little background first:
I am using KDE 3.5.2 on a x86_64 GNU/Linux system.
I was trying to print a web page 
(http://www.cee.hw.ac.uk/hipr/html/gsmooth.html) using Konqueror (Print to PDF). 
It shows a message displaying a gs command line and saying there was an error.

After invastigating it turns out that the problem is related to gs and the 
PostScript file Konqueror produces as an intermediate step.

So I took the command line and the PostScript file and executed the following:
$ gs -q -dSAFER -dPARANOIDSAFER -dNOPAUSE -dBATCH -sDEVICE=pdfwrite 
-sOutputFile='GaussianSmoothing.pdf' -sPAPERSIZE=a4 -c .setpdfwrite -f 
'GaussianSmoothing.ps'
zsh: 8169 segmentation fault  gs -q -dSAFER -dPARANOIDSAFER -dNOPAUSE -dBATCH 
-sDEVICE=pdfwrite   -c  -f

gs works with other PostScript files.

I am going to attach the PostScript file.
Comment 1 Philip Belemezov 2006-04-08 05:32:30 UTC
Created attachment 2144 [details]
This is the GhostScript files gs crashes at.

This is the GhostScript I am trying to convert to PDF.
Comment 2 Philip Belemezov 2006-04-08 05:33:22 UTC
Well, I mean it's the PostScript file. Sorry.
Comment 3 Dan Coby 2006-04-08 22:30:47 UTC
I am unable to reproduce the seg fault with either 8.15 or current svn head.  
(I am using WinXP, MSVC .net 2003.)

However the PDF file which is being created has most of the text missing.
cmd line:

bin/gswin32c -sDEVICE=pdfwrite -sOutputFile=xx.pdf -c .setpdfwrite -f 
688639.ps -c quit

Changing the assignment for pdfwrite bugs.
Comment 4 Hin-Tak Leung 2006-04-09 01:15:44 UTC
ps2pdf works just fine with 8.53 x86 linux,
and running the exact same command with 8.53 as the intial bug 
report works correctly too.

Time for an upgrade or investigate if there is any vendor-applied
patch (unless the 8.15 was compiled from source), I think.

Oh, could it be x86_64 specific?
Comment 5 Dan Coby 2006-04-09 22:10:04 UTC
Hin-Tak Leung,

Did you look at the pdf file produced?  As I noted, I see most of the text 
missing. Do you see the same problem or is your PDF okay?
Comment 6 Raph Levien 2006-04-12 10:19:27 UTC
On my Ubuntu (breezy, 32-bit) install using svn GS, I get the same behavior as
comment #3 - no crash, but much text missing. I also get a few of these reports
from valgrind:

==10691==    at 0x81D91B4: process_composite_text (gdevpdtc.c:157)
==10691==    at 0x81D9092: process_composite_text (gdevpdtc.c:112)

Based on this, it's plausble that the problem is an uninitialized
pte->text.space.s_char.
Comment 7 leonardo 2006-08-11 07:55:52 UTC
The document uses a composite font with Type 3 descendent and FMapType 2. The 
related branch is not yet implemented in pdfwrite.
Comment 8 leonardo 2006-08-11 07:59:02 UTC
Changing the bug title for a better reflection of the problem. The old title 
is "gs crashes when trying to convert a ps to pdf". The unitialized pte-
>text.space.s_char also needs an attention.
Comment 9 leonardo 2006-08-11 08:01:47 UTC
*** Bug 688760 has been marked as a duplicate of this bug. ***
Comment 10 leonardo 2007-01-21 03:05:46 UTC
Bumping priority - need to retest with recent fixes.
Comment 11 leonardo 2007-01-22 02:42:35 UTC
Still not working - a text missed. We're sorry. Downgrade the priority for P3 
for free user bugs.
Comment 12 Hin-Tak Leung 2007-01-22 07:56:50 UTC
Created attachment 2703 [details]
"good" pdf from 8.15.3 shipped by redhat fc6

8.54 shows similiarly missing characters, but strangely enough
with ESP 8.15.3 on x86_64 (fc6) all the characters are there.

Attached for reference, in case somebody can analyse the file
and work out what ESP 8.15.3 is doing right.
Comment 13 SaGS 2007-03-11 16:03:22 UTC
Created attachment 2832 [details]
Pack of 2 suggested patchs (ZIP file).

The attached patches are ment to fix the following bugs, which I found 
to be duplicates of each other:

    Bug #688639 "pdfwrite: a composite font with a Type 3 descendent 
		 and FMapType 2"
    Bug #688760 "ps2pdf loses text in postscript figure when 
		 converted to pdf"
    Bug #688954 "Text disappears when converting some ps files
		 with ps2pdf"
    Bug #689001 "Characters lost converting PS to PDF"
    Bug #689041 "Japanese Font Display Problem is ps2pdf"
    Bug #689105 "Invalid fonts error during converting from PS to PDF"

List of sample files:

    attachment #2144 [details], attachment #2290 [details], attachment #2549 [details], attachment #2550 [details], 
    attachment #2591 [details], attachment #2617 [details] (identical to the preceding one), 
    attachment #2685 [details] (Type 0 font with multiple Type 0/1/3 descendents, see 
    bug #689041 comment #12 on how to use it), attachment #2793 [details].

For testing I compared the output from unpached Ghostscript PS->PPM with 
the one from patched Ghostscript PS->PDF->PPM. (There are some slight 
differences from color and font conversions, but all the text is there.)


FMapType 2 -------

I haven't found a reason for this particular value to make a difference, 
at least not for the attached samples.


Regression from GS 8.15 -------

I have not checked with a "genuine" GS8.15.

The PDFs in attachment #2703 [details] and attachment #2594 [details] do indeed display 
all the text. But that text is converted to bitmaps, and copying it 
produces much garbage (not because of the encoding, but because of 
metrics that make Reader think characters in different rows overlap). 
The current TRUNK tries to create Type 3 fonts containing outlines. It 
succedes with simple Type 3 fonts, and after the attached fix it will 
also succeed with Type 3 fonts that are descendants of Type 0 fonts.


Bug and patch details ------

(A) Many glyphs skipped if Type 0 font with Type 3 descendant
--- (patch: Bug688639-r7777-to-r7777A.diff.txt)

    Function gdevpdtc.c::process_composite_text() essentially 
    transforms a single "show" operation that uses a Type 0 font into 
    one or more "show" operations of substrings that use the Type 0 
    font's leaf fonts. Functions at lower levels 
    (pdf_process_string[_aux]()) do properly detect glyphs that are 
    not accumulated and process the string up to such a glyph, letting 
    the text enum they receive pointing to the not-yet-accumulated 
    glyph. Functions at higher levels do accumulate glyphs when 
    process_composite_text() returns with gs_error_undefined or with 
    gs_text_enum.index < gs_text_enum_t.text.size (meaning not all of 
    the text has been processed), and continue the "show" operation 
    after doing so. So, the mechanism that accumulates descendent 
    Type 3 glyphs on an as-needed basis exists and almost works.

    The bug appears because process_composite_text() does not check 
    whether pdf_process_string_aux() processed the whole string it 
    received or only part of it. There are 3 cases:
    (i)   If the 1st glyph in the substring is not accumulated, then 
	  pdf_process_string_aux() and then process_composite_text() 
	  return gs_error_undefind (-21) with *pte pointing to this 
	  1st glyph. This triggers the glyps accumulation, and this 
	  glyph will be displayed. It is the only case in which 
	  glyphs get accumulated, so only a few glyphs (those at the 
	  beginning of substrings) are ever shown.
    (ii)  If all chars are already accumulated, then the substring 
	  is shown normally. (But only a few chars are accumulated...)
    (iii) If the substring starts with some already-accumulated 
	  glyphs, but also contains one that is not accumulated, then:
	  - pdf_process_string_aux() processes the prefix, and these 
	    chars are displayed;
	  - it returns 0 for success;
	  - process_composite_text() does not check that only part of 
	    the string was processed, and beheaves as if the whole 
	    substring got displayed; "prev" has already been advanced 
	    past the substring, so the not-yet-accumulated glyph and 
	    those after it are skipped and never accumulated.

    The fix: when pdf_process_string_aux() returns success, 
    process_composite_text() checks whether the whole substring was 
    processed or only a part of it. In the 2nd case it explicitely 
    advances *pte past the chars that were effectively "consumed" 
    (letting it to point to the to-be-acumulated glyph) and returns. 
    The caller will take care of accumulating it and continuing 
    displaying the rest of the string.

    Notes:
    - I preferred to advance *pte in a loop. "prev" is already past 
      the whole substring, and I think it cannot be moved "backwards".
      out.index cannot be simply added to pte->index because
      (1) out.index refers to a Type 3 font, thus with 1 byte/glyph 
      and (2) pte->index must be incremented by the number of bytes 
      used to encode those out.index glyph in the Type 0 font, and 
      this font uses multiple/variable number of bytes per glyph.
    - A comment just before gdevpdte.c::pdf_process_string() states 
      that it "Doesn't use or set pte->{data,size,index}". This is
      not completely true: while pte->index is not used to index into 
      the string, it is incremented. The patch initialises out.index 
      to 0, so pdf_process_string_aux() returns the count of chars 
      that it actually processed in out.index.

(B) xyshow/etc: Wrong spacing if Type 0 font with Type 3 descendent
--- (patch: Bug688639-r7777A-to-r7777AB.diff.txt)

    When gdevpdtc.c::process_composite_text() returns, pte->xy_index 
    is incorrect. In gdevpdtc.c -r7777 line #142
    "gs_text_enum_copy_dynamic(pte, (gs_text_enum_t *)&prev, true);":
    - "out.xy_index" has been correctly advanced past the substring
      (or the part of it that was actually processed);
    - "curr.xy_index" has been updated from out.xy_index;
    - "prev.xy_index" still corresponds to the beginning of the 
      substring, and so will do "pte->xy_index"; "prev" will be 
      updated from "curr" at the beginning of the next loop, if any, 
      but this update does not touch "pte" until an additional 
      substring is processed successfully;
    - When the function returns, "pte->xy_index" remains "one 
      substring behind". If the "xyshow" operation is complete, the 
      widths array won't be needed anymore, so this inconsistency 
      won't matter. But if the return is caused by the need to 
      accumulate a glyph, when the "xyshow" operation is continued it 
      will reuse widths of the last successfull substring for the one 
      that's restarted, so some glyphs end up with incorrect widths.

    Fix: Explicitely update pte->xy_index from out.xy_index.
Comment 14 leonardo 2007-04-01 13:56:15 UTC
*** Bug 688954 has been marked as a duplicate of this bug. ***
Comment 15 leonardo 2007-04-01 13:56:32 UTC
*** Bug 689001 has been marked as a duplicate of this bug. ***
Comment 16 leonardo 2007-04-01 13:56:46 UTC
*** Bug 689041 has been marked as a duplicate of this bug. ***
Comment 17 leonardo 2007-04-01 13:57:02 UTC
*** Bug 689105 has been marked as a duplicate of this bug. ***
Comment 18 leonardo 2007-04-01 14:48:57 UTC
Patch to HEAD :

http://ghostscript.com/pipermail/gs-cvs/2007-April/007395.html
Comment 19 leonardo 2007-04-02 11:29:06 UTC
One more patch :
http://ghostscript.com/pipermail/gs-cvs/2007-April/007397.html