Bug 688971 - huge performace problem (with large TT font?)
Summary: huge performace problem (with large TT font?)
Status: NOTIFIED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: General (show other bugs)
Version: 8.54
Hardware: PC Linux
: P4 normal
Assignee: leonardo
URL:
Keywords: bountiable
Depends on:
Blocks:
 
Reported: 2006-11-02 05:53 UTC by Harald Koenig
Modified: 2008-12-19 08:31 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments
neckar.ps.bz2 (2.95 MB, application/octet-stream)
2006-11-02 08:20 UTC, Marcos H. Woehrmann
Details
Pack of 6 suggested patches. (9.23 KB, patch)
2006-11-16 14:24 UTC, SaGS
Details | Diff
The suggested patches, relative to TRUNK -r7842. (9.23 KB, application/x-zip-compressed)
2007-04-11 12:58 UTC, SaGS
Details
Test cases for the new .sort operator. (5.19 KB, application/postscript)
2007-04-11 12:59 UTC, SaGS
Details
rasters.zip (345.28 KB, application/octet-stream)
2007-04-20 00:24 UTC, leonardo
Details
Corrected patch for (A). (6.65 KB, patch)
2007-04-21 10:23 UTC, SaGS
Details | Diff
Modified patch for (E). (3.62 KB, patch)
2007-04-23 03:20 UTC, SaGS
Details | Diff
E.txt (5.84 KB, patch)
2007-04-23 13:32 UTC, leonardo
Details | Diff
F.txt (2.42 KB, patch)
2007-04-23 15:02 UTC, leonardo
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Harald Koenig 2006-11-02 05:53:05 UTC
trying to display this PS file 

   ftp://ftp.science-computing.de/outgoing/koenig/gs/neckar.ps.gz

with gs 8.54 takes >5 minutes CPU time on an AMD64 3200+ (both 32bit
and 64bit binary) while gs 8.50 renders the same file immediately.

neckar.ps was created with xpdf (print to file) and embeds the full MS-PGothic
font  (don't ask me why it includes the full font, only one single char -- the
space character(!) -- was used in the originating PDF).


shall I attach the example file (see URL above) for reference ?  
if yes, compressed (3.7 MB) or uncompressed (11 MB) ?
Comment 1 Marcos H. Woehrmann 2006-11-02 08:20:48 UTC
Created attachment 2573 [details]
neckar.ps.bz2
Comment 2 Marcos H. Woehrmann 2006-11-02 08:32:40 UTC
I've attached the test file so that if the if the file is removed from the ftp site we'll have an archive copy.

I've also verified the problem: gs850 takes <1 seconds to convert the file to ppm @ 72 DPI whereas gs851 
through gshead take minutes.  I'll run search-svn-revs to find what change is responsible for this 
behaviour.
Comment 3 Marcos H. Woehrmann 2006-11-02 10:48:32 UTC
5707 is the first revision that exhibits this problem.  The files that changed
in 5707 were: lib/gs_ttf.ps, src/gstype42.c, src/gstype42.h 
Comment 4 Marcos H. Woehrmann 2006-11-02 10:51:40 UTC
The associate 
Comment 5 Marcos H. Woehrmann 2006-11-02 10:52:23 UTC
The comments associated with revision 5707:

2005-02-27 05:56 Ray Johnston

    Fix handling of broken TrueType fonts that have a loca table that is not
    in order. Bug 687889 for customer 670.

    DETAILS:

    This is clearly a questionable TT font since the Apple TT font reference
    manual clearly states that the length of glyphs can be inferred from
    the difference in successive offsets.

    Note that the PostScript code only needs to deal with the sorted 'locatable'
    in order to decide where to split the glyf strings in the sfnts array
    (in the .dividesfnts proc). Otherwise the PostScript code doesn't care.

    If the .dividesfnts uses a sorted locatable, but doesn't replace the 'loca'
    table written into the sfnts, then the VMerror occurs because the C code
    in 'default_get_outline' calculated the glyph_length for the gs_alloc
    using loca table entries that might be out of order.

    In order to effect the entire fix, I sorted the 'locatable' for the PS
    code in .dividesfnts as well as creating a new 'len_glyphs' array in
    the gs_type42_data structure that holds the actual glyph lengths.
    The glyph lengths are calculated once when the font is initialized in
    gs_type42_font_init. If the 'loca' table is in order, then nothing
    changes, but if it is out of order, we get the lengths by finding the
    next higher glyph offset anywhere in the loca table.

    These changes let the file run and show the same glyphs as Adobe Acrobat
    Reader.

    [lib/gs_ttf.ps 1.41, src/gstype42.c 1.49, src/gxfont42.h 1.21]
Comment 6 Marcos H. Woehrmann 2006-11-07 11:29:14 UTC
The ideal solution would be to implement the sort (currently in lib/gs_init.ps as .sort) in C, calling qsort(), 
converting the <lt-proc> to a function type 4. 
Comment 7 SaGS 2006-11-16 14:24:41 UTC
Created attachment 2618 [details]
Pack of 6 suggested patches.

General description of the problem -------

The performance problem with attachment #2573 [details] comes from C code, all 
the 15min (on an Intel P4 2.4MHz, debug version of gswin32c.exe) 
necessary to process it are spent in a cycle that verifies the 'loca' 
table to be sorted, more specifically in SVN TRUNK -r7203 
src\gstype42.c::gs_type42_font_init() lines #235..246.

Note that the 'loca' table is sorted, so the sample is a perfectly 
valid PostScript file, containing perfectly valid fonts.

Here's what happens:

- the file defines 61 Type42 fonts that share a huge 'sfnts' array;
- this 'sfnts' appears to contain a separate string for each glyph;
- there are almost 20400 glyphs, so approx 20400 strings;
- the 'loca' is placed AFTER the 'glyph';
- the cycle mentioned above calls gstype42.c::get_glyph_offset() once 
  per glyph, to fetch the corresponding 'loca' entry;
- get_glyph_offset() ends up calling zfont42.c::string_array_access_proc();
- to find a specific place in the whole 'sfnts', 
  string_array_access_proc() starts each time from the beginning of 'sfnts';
- since the 'loca' is after 'glyph', each call to get a glyph offset has 
  to skip approx 20400 strings.
- doing the math, the body of the "for (;; ++index)" cycle in 
  string_array_access_proc() is executed
  61 fonts x 20400 glyphs each x 20400 strings to skip = 25.4 billion times.

Attached are 6 patches that solve this and related problems.


Suggested patches -------


(A) Faster "seeks" into 'sfnts'.
--- Patch: file Bug688971-r7203-to-r7203A.diff.txt in attached ZIP.

    Details:

    src\zfont42.c::string_array_access_proc() is now able to cache 
    info about the string is used last, and start the next search from 
    there. This info in located together with the ref to 'sfnts', and 
    consists of index of the string (.mru_sfnts_index) and total number 
    of data bytes that precede this string (.mru_sfnts_pos).

    If passing NULL for the 2 new parameters, string_array_access_proc() 
    beheaves at it did before and starts the search from the beginning.

    Otherwise it does one of:
    - search forward from the mru string (possibly returning data from 
      it, without actually skipping anything), if new position >= 
      current's string starting position;
    - search backwards from current string if requested position is in 
      the 2nd half of the fragment before the current string;
    - search forward from the beginning if in 1st half.


Patch (A) alone reduces the execution time for the sample file from 
15min to 2sec. The other ones do not affect this sample (or at least 
not in a significant way), but fix similar problems in other parts.


(B) Faster "seeks" into array-of-strings CMaps.
--- Patch: file Bug688971-r7203A-to-r7203AB.diff.txt in attached ZIP.
    Depends on, and to be applied after, the one for (A).

    Details:

    This comes almost for free, since "seeks" into array-of-strings 
    CMaps are handled by the same src\zfont42.c::string_array_access_proc().
    We only need to provide and initialize the 2 fields for storing 
    the info on the most recently used string.


(C) Faster FAPI seeks.
--- Patch: file Bug688971-r7203-to-r7203C.diff.txt in attached ZIP.

    Details:

    This optimization is similar to the one in (A), but done for 
    src\zfapi.c::sfnts_reader_seek() (which begins with a comment 
    "/* fixme : optimize */"; well, not it's better.) This time, only 
    the "total data bytes before current string" needed to be added
    (new field .pos_index), because the string index was already 
    there (.index).

    Note: It seems to me some sort of plug-in is needed in order to 
	  exercice this code. Is this for "UFST"? I don't have this, 
	  so I could not test this part.


The following 3 patches are for dealing with invalid TTFs, namely ones 
with a 'loca' table that is not in increasing order.


(D) Reimplement the ".sort" operator in C.
--- Patch: file Bug688971-r7203-to-r7203D.diff.txt in attached ZIP.

    Details:

    The current PS implementation of lib\gs_init.ps::.sort uses a 
    slow O(n^2) algorithm. Plus, array indexing operations are lenghty 
    in PostScript. It seems to me this was initially written to sort 
    the very few "%disk*%" names, so speed was irrelevant. Now, it is 
    used to sort 'loca' tables (in some damaged TTFs). I estimate that 
    if it were to be applied to the 61 fonts in the sample file, it 
    would take 7 months to finish.

    The patch implements the heap sort algorithm from the programmer's 
    bible. This algorithm is guaranteed to be O(n log n) in the worst 
    case (unlike quick sort which is O(n log n) only on average, its 
    worst case being O(n^2)), and uses a constant amount of extra 
    memory (quick sort uses O(log n) for its stack).

    Variable's names are those from Knuth's book. Labels denote the 
    algorithm's steps. (Hope unreferenced labels are not signaled as 
    warnings. VS2003 doesn't complain. If you find some other compiler 
    that does, I'll change these into comments.)

    The implementation retains the maximum generality for the predicate, 
    i.e. it can be anything that is able to compare 2 objects on the 
    o-stack and leave a bool in their place. The predicate is not 
    restricted to what's available to a FunctionType 4 function.

    The new code sorts an array of 65534 random numbers in 0.5 sec.


(E) Faster glyph lengths calculation in gs_type42_font_init(),
--- for the case of unsorted 'loca'.
    Patch: file Bug688971-r7203-to-r7203E.diff.txt in attached ZIP.

    Details:

    If 'loca' is not sorted, glyph lengths are computed in 
    src\gstye42.c::gs_type42_font_init() lines #250..267. The method 
    used there is inefficient: 2 nested loops, each iterating through 
    the whole 'loca'. I estimate that, if the huge 'sfnts' in the 
    sample file were not sorted already, these loops would need 
    7 months to complete. (The inner cycle is approx equivalent to 
    the one in lines #235..246, so it needs 15min; it is executed 20400 
    times by the outer "for (i = 0; i < loca_size - 1; i++)"; 
    20400 x 15min = 306000min = 7months.)

    The new code first sorts the table by glyph offset, and only then 
    computes lengths. For a file of the sample's size, the new code 
    needs approx 0.6 seconds to complete.

    Note that the sort is "stable" (relative order of equal entries is 
    preserved), because the [original] index into the table is used 
    as a 2nd key. This is important for (F) below.


(F) Fix behaviour if unsorted 'loca' and empty glyphs
--- Patch: file Bug688971-r7203E-to-r7203EF.diff.txt in attached ZIP.
    Is an extra for, and to be applied after, the patch for (E).

    Details:

    Old code in src\gstye42.c::gs_type42_font_init() lines #250..267 
    has an unpleasant effect, mentioned in the comment: empty glyphs, 
    except if they have the highest offset, suddenly become non empty, 
    because they are considered duplicates of some other glyphs.

    The patch for (E) above reproduces this behaviour.

    The patch for (F) lets empty glyphs as empty. Given the new method 
    in (E) and the fact the sorting is stable, patch for (F) allows to 
    observe the following rule:

	glyphs corresponding to 'loca' fragments that are
	"in good shape" are not affected, even if they are empty

    In this context, "in good shape" means it is sorted and no other 
    glyphs belong inside the fragment (= their offsets are either 
    less than the 1st, or grater than the last). Otherwise put, a 
    change in some part of the 'loca' doesn't affect the other parts.

    Why is this related to the sample file:

    - If the 'loca' in the sample were not sorted (no matter where), 
      then two exclamation points get added to the output, one before 
      the word "Neckar" and one after. These don't appear in the 
      original file; there are 2 blanks instead.
      (More details about this on request.)
Comment 8 SaGS 2007-04-11 12:58:40 UTC
Created attachment 2878 [details]
The suggested patches, relative to TRUNK -r7842.

I verified the patches from comment #7, and they are applicable to 
the current TRUNK. However, I regenerated them because some became 
"imperfect patches" (apply with offsets or a "fuzz factor"). The 
functionality is the same, see comment #7.


About point (C): I still cannot exercice the code for the "sfnts 
reader" in zfapi.c. If it is used only for UFST, then I don't have 
that. If it needs "/FontType 42" in FAPIfontmap, then this doesn't 
work at all (rangecheck), at least not with FreeType. From the code, 
I see there aren't so many seeks and reading is mostly sequential, 
so maybe there's no real performance problem here. Feel free to 
ignore this patch.


About point (D): I'll attach a file I used to check the behaviour of 
the new .sort operator in various cases. It may help with testing.


About point (E): To exercice the affected code while using a good 
Type 42 font (with 'loca' already sorted, so you can easily compare 
with unpatched GS), I used a hack like the following:

Index: src/gstype42.c
===================================================================
--- src/gstype42.c ("normal" version)
+++ src/gstype42.c (hacked to beheave as if 'loca' not sorted)
@@ -260,7 +260,7 @@
     for (i = 1; i < loca_size; i++) {
	glyph_offset = get_glyph_offset(pfont, i);
	glyph_length = glyph_offset - glyph_start;
-	if (glyph_length > 0x80000000)
+	if (true || glyph_length > 0x80000000)
	    break;				/* out of order loca */
	pfont->data.len_glyphs[i - 1] = glyph_length;
	glyph_start = glyph_offset;

(don't do this with attachment #2573 [details], it would take ages to run.)


About point (F): This is not strictly part of this bug, although 
it does affect the sample file, and now it has a bug report of its 
own - bug #689038 "GS doesn't display T42 font correctly". If you 
commit this, please take a look and close that bug too.
Comment 9 SaGS 2007-04-11 12:59:16 UTC
Created attachment 2879 [details]
Test cases for the new .sort operator.
Comment 10 SaGS 2007-04-11 23:54:02 UTC
> About point (F): ... although it does affect the sample file ...

This came out wrong, sorry. To clarify: the 'loca' is sorted, so the sample is 
not affected as-is. But if you force sorting the [already sorted] 'loca' as in 
the "hacked to beheave as if 'loca' not sorted" above, it will be, with or 
without the patch for (E). Note: to check, you need to apply (A); it's a 
matter of running 3/4 of an hour versus forever.
Comment 11 leonardo 2007-04-19 12:23:43 UTC
Dear SaGS,

Thank you for submitting patch to Ghostscript.
The patch A looks definitely useful, I'm testing it now.
I'll work on other patches soon.

I think the patch C isn't good. You say that the problem is caused by 
representing 'glyph' with numerous PS strings. The old code accesses 'glyph' in 
FAPI_FF_get_glyph lines 665 - 671. Note it constructs another sfnts_reader for 
each glyph. Your patch C does not optimize this case.

IMO the mentioned old code portion to be rewrtiien with calling 
gs_font_type42_s::data.string_proc . Note this method is optimized due to 
change to z42_string_proc in your patch A. Could you please submit a new patch 
for C ? 
Comment 12 leonardo 2007-04-19 12:35:54 UTC
Regareding patch D :

[beg quote]
 The implementation retains the maximum generality for the predicate, 
    i.e. it can be anything that is able to compare 2 objects on the 
    o-stack and leave a bool in their place. The predicate is not 
    restricted to what's available to a FunctionType 4 function.

    The new code sorts an array of 65534 random numbers in 0.5 sec.
[end quote]

Well, I would prefer to have a faster patch with less generality.
I'm not clear why we need to maintain the general one with so complicated C 
code. I would sort entire integer array is a single PS operator in 0.01 second. 
Would you like to submit another patch ?
Comment 13 Ray Johnston 2007-04-19 22:28:22 UTC
In my opinion it is well worthwhile having the generalized .sort
even if we choose to develop a less general .sortint (name is
arbitrary) to use in for TT fonts.

Regarding this bountiable bug and the contribution by SaGS it is not
reasonable to require further development that gets a small incremental
performance improvement.
Comment 14 leonardo 2007-04-19 23:51:56 UTC
I agree with Ray's concern, except opne point. The main reason for .sortint is 
simpler code rather than performance.
Comment 15 leonardo 2007-04-20 00:14:33 UTC
The patch A causes a regression with MSVC8 debug build with following cases :

ppmraw -r72 Bug687603.ps : (in page 2) wrong glyph placement
ppmraw -r72 korea.ps : wrong glyph placement
ppmraw -r300 Bug687603.ps : (in page 2) wrong glyph placement
ppmraw -r300 Bug688793.ps : A small glyph displacement
ppmraw -r300 korea.ps : wrong glyph placement

The file Bug688793.ps is publically available as the attachment "vertical.ps" 
to the bug 687603. Others are propertary. Once again I suggest to SaGS to make 
a contract agreement with Artifex, at least for getting access to our testbase.

Command line used for this testing :

..\..\gs-hd\bin\gswin32c.exe -IF:/AFPL/gs-hd/lib;f:\afpl\fonts -r%res% -
dNOPAUSE  -dBATCH  -dNOOUTERSAVE -sDEVICE=ppmraw -sOutputFile=%dir%\cur.ppm -
dLastPage=1 -c false 0 startjob pop -f - <%1 1>>cur-1.log 2>>cur-2.log
Comment 16 leonardo 2007-04-20 00:24:47 UTC
Created attachment 2904 [details]
rasters.zip

Attaching rasters for Comment #15 made with korea.ps . 'hdr' is the revision
7865, 'cur' is 7865 with patch A.
Comment 17 SaGS 2007-04-21 10:23:51 UTC
Created attachment 2909 [details]
Corrected patch for (A).

About patch (A) and comment #15:

The problem in comment #15 was caused by non-standard treatement of 
ifont.h::font_data struct in the garbage collector. Normally, adding 
non-relocatable fields to a struct does not change it's GC procedures. 
Here, however, a complicated struc is treated as an array of refs, the 
# of elements being computed as sizeof (font_data) / sizeof (ref). 
The new fields were considered an extra ref, and while resetting 
l_mark the GC was actually clearing bit 0 of mru_index. Attached patch 
rectifies the situation by adding specific GC procedures that compute 
the number of refs correctly.

Note on struct members alignement and padding:

- a ref has a field of type long, so its alignement requirement is 
  at least that of a long;
- ushort is at most as restrictive as u/long;
- so the inter-field padding and alignement requirements for 
  struct _f42, union _fs, and the whole font_data, don't change.

---

About patch (B), which depends on patch (A):

   The diff is still applicable as-is (with an offset, but it's OK).

---

About patch (C):

You have a point with comment #11, and I may look into using 
string_array_access_proc() (via data.string_proc()) here.

But, as I mentioned, I NEVER, EVER, managed to find a case 
when sfnts_reader.seek() gets called, so not sure it's even worth 
changing it. I did build GS with FreeType, and tried various use 
scenarions to no avail. The change needed to use 
string_array_access_proc() is not trivial, and I have to test them.

From another point of view, the time spent in the [unoptimized] 
seek() has to be compared to the time needed by other actions 
performed by its callers, for example compare 2 x seek() with 
rendering of one glyph. Of course, optimizing seek() means an 
overall performance improvement, but this improvement may be 
insignificant. The situation is different with the other patches: 
patch (A) reduces a O(n^2) check for a sorted 'loca' to O(n), and 
patch (E) reduces a O(n^3) operation to O(n log n).
Comment 18 leonardo 2007-04-22 07:01:41 UTC
The patch B is not a correct architecture. A CMap is not a property of a Type 
42 or Type 11 font. It is a property of a Type 0 font. Many Type 0 fonts may 
share share same Type 11 font and have different CMap tables. In this case the 
suggested patch will retirn wrong strings. Due to that the patch B must not be 
committed.
Comment 19 leonardo 2007-04-22 07:03:50 UTC
The patch A has been committed to HEAD as rev 7867 :
http://ghostscript.com/pipermail/gs-cvs/2007-April/007450.html
Comment 20 leonardo 2007-04-22 07:28:32 UTC
The patch C must not be committed due to 2 reasons :
1. (See Comment #11) FAPI_FF_get_glyph creates another instance of sfnts_reader 
for each glyph. Therefore the cache info is lost between glyphs.

2. Even if we change it with using a single instance for all glyphs in a font, 
immediately before accessing a glyph it accesses a metrics table with 
sfnt_get_glyph_offset, so caching a single pointer won't give the wanted 
optimization. Need to cache 2 pointers at least - one for 'glyph' and another 
for vmtx/hmtx .

Besides that, a simpler way to optimize FAPI_FF_get_glyph is a direct call 
to 'string_proc' instead sfnts_reader : 

     pfont42->data.string_proc(pfont42, offset0, offset1 - offset0, &data);

In this case the patch A needs the improvement with caching 2 pointers. Other 
calls to sfnts_reader::seek don't worth to optimize.
Comment 21 SaGS 2007-04-22 08:32:50 UTC
I don't understand how would the objection in comment #18 apply, but 
I understand where the confusion comes from. In comment #6 point (B), 
I have incorrectly written "CMap". I should have typed "CIDMap", the 
ones for CIDFontType 2. The .mru_CIDMap_* (note these names are 
correct, "_CIDMap_" not "_CMap_") added by that patch pair with the 
existing .CIDMap field. The latter corresponds to the "CIDMap" entry 
described in PLRM LL3 TABLE 5.17 "Additional entries specific to 
Type 2 CIDFont dictionaries", and not to the "CMap" entry described 
in TABLE 5.8 "Additional entries specific to Type 0 fonts", which 
indeed is a completely different thing. 
Comment 22 leonardo 2007-04-22 11:12:38 UTC
The patch D has been committed as rev 7868 :
http://ghostscript.com/pipermail/gs-cvs/2007-April/007451.html
Comment 23 leonardo 2007-04-22 12:25:39 UTC
Regarding Comment #21 : After the correction CMap/CIDMap, the comment #18 to be 
withdrown. When CIDMap is an array of strings, the number of strings is always 
less than 4 (because a CID is not greater than 65535), so caching the pointer 
won't give a sensible speed up. So we still have no reason to commit B.
Comment 24 leonardo 2007-04-22 12:59:57 UTC
The patch E causes a regression with this command line :

gswin32.exe -IF:/AFPL/gs-hd/lib;f:\afpl\fonts   -dNOPAUSE  -dBATCH -
sDEVICE=pdfwrite -sOutputFile=z:\t1\cur.pdf -
dLastPage=1 "H:\AuxFiles\CompareFiles\Bug687889.pdf"  -c quit 

It fails with VMerror in gxfcopy.c ln 516 : in function copy_glyph_data the 
variables 'size', 'str_size', pgdata->bits.size equal to 0xfffffe44.


Comment 25 SaGS 2007-04-23 03:20:10 UTC
Created attachment 2913 [details]
Modified patch for (E).

I found a case that old patch (E) does not handle, and which could 
explain the crash in comment #24. After clarifying (E), I'll 
regenerate and attach (F), which depends on (E).

Details:

    Last entry in the 'loca' table is an "offset past last glyph", so
    must be greater than any other entry. If it is not, then old (E) 
    does not function correctly, namely:

    - The psortary element that corresponds to this entry moves, after 
      the sort, somewhere inside the array; this also pushes an element 
      (corresponding to a glyph with the highest offset) to the very 
      end of the array.
    - normally there's no length to be derived form the last element 
      of psortary because it does not correspond to a real glyph. 
      However, if a glyph has been "pushed" here because of a too-low 
      "offset past last glyph", the length of that glyph remained 
      uninitialized. This could explain the crash mentioned in comment #24, 
      but I cannot test with that file.
    - Finding, in the middle of psortary, the element with .glyph_num == 
      loca_size - 1, the line "pfont->data.len_glyphs[psort->glyph_num] 
      = glyph_length" accesses an element past the end of len_glyphs[].

Note on the length of the glyph with the highest offset, if the last 
'loca' element does not have the highest value:

  - In this case, the length of this glyph cannot be determined from 
    'loca'.
  - I'm doing what the old code did, that is put length = 0.
  - This can be improved by taking into consideration the length of 
    the 'glyf' table, and assuming this glyph extends from it's 
    starting offset given in the 'loca' to the end of 'glyf'.

For testing: I don't have files with this kind of damage. To exercice 
this code, I used good files and compared the output of:

  - rev 7872 as-is;
  - rev 7872 + hack to force sorting the 'loca' (see end of comment #8)
    + new (E) + (F). (F) is necessary because the sorting in (E) 
    replaces empty glyphs with non-empty ones, as the old sorting does, 
    so in many cases spaces are displayed as "!". (F) and the "hack" 
    are not necessary when comparing the output from damaged files 
    (= with unsorted 'loca'), but I don't have such files.
Comment 26 leonardo 2007-04-23 11:16:55 UTC
The Comment #25 patch doesn not fix the regression explained in Comment #24.
Comment 27 leonardo 2007-04-23 11:23:43 UTC
Sorry the last comment is not fully correct.
With the Comment #25 patch and the Comment #24 test case the failure VMerror 
happens in gstype42.c ln 525. The variable glyph_length equals to 0xfffeacde.
Comment 28 leonardo 2007-04-23 12:29:36 UTC
I believe that the patch E has never been tested (in attachment 2618 [details] and in 
attachment 2913 [details]) : qsort(,,, gs_type42_font_init_compare) is an identity action 
equal to noop.
Comment 29 leonardo 2007-04-23 12:42:36 UTC
Also the expression for glyph_length appears to be the loop invariant, so I 
think the bounty program works here against Artifex.
Comment 30 leonardo 2007-04-23 13:32:48 UTC
Created attachment 2915 [details]
E.txt 

A better patch for E being tested now.
Comment 31 leonardo 2007-04-23 15:01:05 UTC
Patch E (from attachment 2915 [details]) has been committed to HEAD  as rev 7877 :
http://ghostscript.com/pipermail/gs-cvs/2007-April/007460.html
Comment 32 leonardo 2007-04-23 15:02:54 UTC
Created attachment 2916 [details]
F.txt

The adopted patch F being tested now. I'm not sure that dropping overlapped
glyphs will work fine. Will see from the regression test.
Comment 33 leonardo 2007-04-23 17:18:54 UTC
The patch 2916 has been committed as rev 7879.
Comment 35 SaGS 2007-04-24 02:51:58 UTC
About comment #23 and patch (B):

    The worst case is still a CIDMap array of length 64K. After all, 
    100 strings would be more than enough for the sfnts in the sample 
    attached to this report, but it uses 24500+.

    However, I don't mind not commiting that patch, as I don't expect 
    a spectacular increase in performance from it (could be a 2x, but 
    not a O(n^2)->O(n) or O(n^3)->O(n log n) as the others). It all 
    depends on the access pattern to this array, and I don't have 
    statistics to rely upon.

    You may wonder why I posted it. Because (1) it has the potential 
    to improve speed, and (2) it comes for free.


About patch (C):

    Using FAPI=FreeType, Type 1 fonts are passed to FAPI but not 
    TTFs/ Type 42 fonts. From what I see, the affected code could be 
    very well dead code, so doesn't need any intervention.


About comment #26 and comment #27:

    The objection is invalid. gstype42.c rev 
7872 lines 523..528 

    compute a glyph length using 'loca' directly, not by accesing the 
    precomputed lengths from .data.len_glyphs. The object of the 
    current bug is the performance of pre-computing these lengths, 
    not physically rearranging the 'loca'. Any failure at the place 
    mentioned has nothing to do with this report or the attached 
    patches. It just shows there are other places in GS that don't 
    deal well with unsorted 'loca', and an old patch needs a 
    continuation.


About comment #28:

    "qsort(,,, gs_type42_font_init_compare)" sorts by glyph offset 
    (as a primary key, the second key being the glyph index). If 
    'loca' is not in increasing order, it's not a noop. And an 
    unsorted 'loca' is what we are talking about, right?


About comment #29:

    This one is really puzzeling. Did you notice the "psort--" in the 
    "for"? glyph_length gets the length of the current glyph, 
    computed as number of bytes from it's own offset (from 'loca', 
    copied into psort[0]) to the next higher offset (found in 
    psort[1] after the sort).


About comment #30 and comment #31:

    E.txt is my attachment #2913 [details] with (a) an irrelevant rewrite, 
    (b) implemented my suggestion from comment #25 paragraph "This 
    can be improved by taking into consideration ..." and (c) 2 bugs 
    added.

    The irrelevant rewrite: my "psort[0].glyph_offset" at the end of 
    one iteration is the same as "psort[1].glyph_offset" at the 
    beginning  of the next one (see the "psort--"). E.txt saves this 
    value in "last_glyph_offset".

    The "This can be improved by ...": this change is OK, but has 
    nothing to do with this bug report, plus the implementation is 
    buggy (see below). If final elem in 'loca' is < highest glyph 
    offset, my code gives the same 0-length final glyph as the old 
    code.

    The "gs_alloc_bytes" -> "gs_alloc_byte_array": maybe this change 
    was intended, maybe it's necessary, don't know, but it has no 
    connection with this report or any of the patches.

    1st bug, newly added: the code that takes into account the length 
    of 'glyf' for computing the final glyph's length malfunctions 
    when there are 'loca' entries higher than the 'glypf' length. In 
    such a case, the final glyph gets a huge (= negative cast to 
    uint) length.

    2nd bug, originally present in my (E) but fixed in the 2nd version:
    one of the elements of psortary has .glyph_index == loca_size - 1 
    == numGlyphs. Thus, E.txt will write past the end of .len_glyphs.

    The code I added for dealing with the final glyph's length was 
    merged into the 'for'. Given that it needs special treatement (see 
    "1st bug" above), it's not necessarily a good idea.


About comment #32 and comment #33:

    F.txt is my patch for (F), regenerated taking into account the 
    changes made for (E). Nothing more, nothing less.
Comment 36 SaGS 2007-04-24 02:52:21 UTC
I am reopening this bug report due to the 2 bugs contained in E.txt.
See previous comment for details.
Comment 38 leonardo 2007-04-24 09:47:15 UTC
Well I need to respond to some issues mentioned in Comment #35.

1. The statement ""qsort(,,, gs_type42_font_init_compare)" sorts by glyph 
offset " is wrong. In the submitted patch the function 
gs_type42_font_init_compare does not compare glyph offsets. It compares 
addresses of elements :

+gs_type42_font_init_compare (const void *a, const void *b)
+{
+    if (a < b)
+	return -1;
+    else if (a > b)
+	return +1;
[...]
Since addresses are always in the increasing order, the overal action is noop. 

2. The statement "1st bug, newly added: .... malfunctions when there are 'loca' 
entries higher than the 'glypf' length". This statement is wrong due to the 
following code fragment :

	qsort(psortary, loca_size, sizeof(gs_type42_font_init_sort_t), 
gs_type42_font_init_compare);
	if (psortary[loca_size - 1].glyph_offset > glyph_size)
	    return_error(gs_error_invalidfont);

Entries higher than the 'glypf' length is an invalid font data.

3. "F.txt is my patch for (F), regenerated taking into account the changes made 
for (E). Nothing more, nothing less." This statement is true, the log message 
reflects that.

I'm closing this bug now because we're satisfied with performance. Please do 
not reopen unless you attach another test case with a poor performance : 
minutes or longer. Artifex will pay bounty to SaGS for this bug, subtracting a 
rebate for the untested patches submitted. We're in progress with improving our 
bounty policy against untested patches, so the amount of the rebate is not 
determined yet. We'll send another notice when it is.