Bug 689963

Summary:	Infinite loop converting PostScript to PDF
Product:	Ghostscript	Reporter:	Marcos H. Woehrmann <marcos.woehrmann>
Component:	PDF Writer	Assignee:	Ken Sharp <ken.sharp>
Status:	NOTIFIED FIXED
Severity:	normal
Priority:	P2
Version:	master
Hardware:	All
OS:	All
Customer:	210	Word Size:	---
Attachments:	689963.patch

Description Marcos H. Woehrmann 2008-07-10 12:40:45 UTC

The customer reports and I've verified an infinite loop when converting the attached PostScript file to 
PDF; the customer reported it with 8.62 but head (r8827) fails the same way.

The command line I'm using:

  bin/gs -sDEVICE=pdfwrite -o test.pdf ./p4_x1.ps

Note when converting the file to a bitmap image (such at tiff24nc) the infinite loop does not happen, 
but Ghostscript reports: 

  A Marker caused a PostScript error, continuing processing...

Additional information from the customer:

In my investigation, if I replace the string "SUFQQV+CourierNewPSMT"  with  "MDYOKA+ArialMT"  in the 
file,  then the problem goes away.
 Both "SUFQQV+CourierNewPSMT" and "MDYOKA+ArialMT" are CID-fonts embedded in the ps file.

Comment 1 Marcos H. Woehrmann 2008-07-10 12:41:04 UTC

Created attachment 4210 [details]
p4_x1.ps

Comment 2 Ken Sharp 2008-07-11 03:29:00 UTC

This doesn't look like a reasonable file to me. Its not a single job, its two
which have been 'cat'ed together. 

The second job comments state that it has eight pages, but only contains 3 page
headers, page 2 of the second job doesn't seem to contain a showpage nor does
page 3.

Still, there is some kind of problem with the CourierNew font, its adding a full
32-bits worth of glyphs to the font when gathering the font info for pdfwrite. 

If you change the font usage as described, then pdfwrite doesn't embed it, and
so you don't get the problem. I don't think the rendering devices will access
the font info, and so they also don't experience the problem.

I've reduced the job to a single font (the offending CourierNew) and removed all
the extraneous pages and other stuff, I'll carry on looking into it.

Comment 3 Ken Sharp 2008-07-11 04:06:34 UTC

I don't think this is in an infinite loop, though it will take a very long time
to complete.

The font info code is trying to determine whether this is a fixed pitch font,
and it does it by starting at glyph index 0, extracting each glyph in turn and
checking against the last width. If they differ the font isn't fixed pitch. If
they are the same, increment the glyph index, get the next glyph and repeat.

It seems that for TrueType fonts at least if the glyph is not present in the
metrics table then we use the width of the last glyph defined in the metrics table.

In the case of Arial we rather quickly find glyphs with different widths and
exit. Courier, of course, *is* a fixed pitch font, so we don't find different
widths, and the 'default' width is the same as all the real glyphs.

As a result we keep on going, which means that we try to probe the entire
Unicode glyph space, 0 -> 0xffffffff, which takes rather a long time....

We ought to stop when we reach the end of the code space range defined in the
CMap (0xffff in this case), we don't test for this, and the information isn't
obviously present, so I'll have to go digging.

Comment 4 Ken Sharp 2008-07-11 07:57:42 UTC

Created attachment 4212 [details]
689963.patch

Proposed patch to resolve this. In z11_enumerate_glyph, check the requested
index
against the number of glyphs in the CMap. If the requested glyph index exceeds
the count of glyphs, return a rangecheck error.

I'm still sorting out getting my regression test working on the cluster (its
all my fault), but a local regression test shows no differences.

Comment 5 Ken Sharp 2008-07-22 07:25:46 UTC

Since nobody complained about my proposed patch, and regression testing showed
no problems, I've committed it as :

http://ghostscript.com/pipermail/gs-cvs/2008-July/008442.html

Problem resolved for me.