Bug 692500 - too long to open and operate with http://www.dante.de/events/dante2011/programm/vortraege/folien-ts.pdf
Summary: too long to open and operate with http://www.dante.de/events/dante2011/progra...
Status: RESOLVED WONTFIX
Alias: None
Product: MuPDF
Classification: Unclassified
Component: mupdf (show other bugs)
Version: unspecified
Hardware: PC Linux
: P4 normal
Assignee: Tor Andersson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-14 18:16 UTC by Pablo Rodríguez
Modified: 2012-07-20 18:43 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pablo Rodríguez 2011-09-14 18:16:18 UTC
Hi there,

mupdf needs more than 12 secs to open http://www.dante.de/events/dante2011/programm/vortraege/folien-ts.pdf:

$ time mupdf folien-ts.pdf 
real	0m13.898s
user	0m12.437s
sys	0m0.018s

evince doesn't need that much:

$ time evince folien-ts.pdf 
real	0m2.913s
user	0m2.043s
sys	0m0.159s

Running "pdfinfo-mupdf folien-ts.pdf" doesn't end after a minute.


And listing fonts enters (as probably pdfinfo-mupdf itself with this file) in a 100% CPU load or memory leak, since I get no results after 4 min 30 secs.

pdffonts form poppler does the job in half second, although it warns about illegal entries:

time pdffonts folien-ts.pdf
name                                 type              emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
HRWFPP+SyntaxLTStd-Roman             CID Type 0C       yes yes yes     25  0
Error: Illegal entry in ToUnicode CMap
Error: Illegal entry in ToUnicode CMap
Error: Illegal entry in ToUnicode CMap
Error: Illegal entry in ToUnicode CMap
Error: Illegal entry in ToUnicode CMap
Error: Illegal entry in ToUnicode CMap
WFVGOV+Times-Roman                   TrueType          yes yes yes   1014  0
QQBMAS+TimesNewRomanPS-BoldMT        TrueType          yes yes yes   1015  0
Oxoniensis                           Type 1C           yes no  yes   1016  0
PJGCZF+TimesNewRomanPSMT             TrueType          yes yes yes   1017  0
UVEVMP+txsy                          Type 1            yes yes yes   1018  0
NewBaskervilleStd-Roman              Type 1C           yes no  yes   1019  0
CWFZTP+Times-Bold                    TrueType          yes yes yes   1020  0
UNVBXX+SyntaxLTStd-Italic            CID Type 0C       yes yes yes   1471  0
QXXFRD+Korinthia                     CID Type 0C       yes yes yes   1959  0
UFRZKQ+ItalianOldStyleMT-Bold-2      CID TrueType      yes yes yes   1960  0
BCJKNN+ItalianOldStyleMT-2           CID TrueType      yes yes yes   1961  0
XIZTVL+ItalianOldStyleMT-Italic-2    CID TrueType      yes yes yes   1962  0
GJCGIZ+LMMathSymbols10-Regular       Type 1            yes yes no    1963  0
QXXFRD+Korinthia                     CID Type 0C       yes yes yes   2544  0
UFRZKQ+ItalianOldStyleMT-Bold-2      CID TrueType      yes yes yes   2545  0
BCJKNN+ItalianOldStyleMT-2           CID TrueType      yes yes yes   2546  0
XIZTVL+ItalianOldStyleMT-Italic-2    CID TrueType      yes yes yes   2547  0
GJCGIZ+LMMathSymbols10-Regular       Type 1            yes yes no    2548  0
KGPKXY+DejaVuSansMono                CID TrueType      yes yes yes   5480  0
FQJROF+DejaVuSansMono-Bold           CID TrueType      yes yes yes   5481  0
VOJKBU+SyntaxLTStd-Bold              CID Type 0C       yes yes yes   8377  0
WIVFPI+TimesNewRomanPSMT             CID TrueType      yes yes no   19132  0
SLNTCV+TimesNewRomanPS-BoldMT        CID TrueType      yes yes no   19133  0
PXRGMD+TimesNewRomanPS-BoldMT        CID TrueType      yes yes no   26172  0
NZVHZT+TimesNewRomanPSMT             CID TrueType      yes yes no   26173  0
FPQVPQ+TimesNewRomanPSMT             CID TrueType      yes yes no   36511  0
VZCDAP+TimesNewRomanPS-ItalicMT      CID TrueType      yes yes no   36512  0

real	0m0.564s
user	0m0.466s
sys	0m0.030s

Just in case it helps,


Pablo
Comment 1 Sebastian Rasmussen 2011-09-23 23:27:33 UTC
I did a very quick analysis of this PDF. Each page refers to 4 0 R as its Shadings dict. It contains 20199 radial shadings, most of which are identical. Thus I can probably safely say that the fonts are not at issue here.

Page two of the PDF has a line of blue/teal dots towards the bottom of the page, presumably indicating to the slide number (as the dots increase with the slide number). On page two there are two dots are drawn 34 times on that page and each drawing of a dot has its own (often identical) shadingpattern. Thus on page two there are 68 shadings. Worth noticing is that there are 34 pages in the PDF. Computing 34 * 2 + 34 * 3 + 34 * 4 + ... + 34 * 33 + 34 * 34 = 20196 which roughly equals the number of shadings.

pdfinfo is probably extremely slow in this one since I believe that it contains multiple O(N^2) loops. Finally, I believe that once pdfclean -dggg has been fixed it may (when run repeatedly) be able to optimize this file quite significantly given that the shading objects are identical, but doing so would take a very long time.

This file was generated by LuaTeX/ConTeXt:
    Info object (41169 0 R):
    <<
      /Producer (LuaTeX-0.65.0)
      /Creator (ConTeXt - 2011.03.30 11:21)
    >>

I doubt that this file can be rendered quickly in any PDF viewer given the enormous number of objects to draw. Basically my conclusion is that this file is extremely unoptimized which results in abnormally long rendering times...
Comment 2 Tor Andersson 2012-07-20 11:26:59 UTC
The file opens quickly on my slow machine. It's probably been fixed as a side effect of commits to optimize large dictionary objects by sorting.
Comment 3 Pablo Rodríguez 2012-07-20 18:43:22 UTC
(In reply to comment #2)
> The file opens quickly on my slow machine. It's probably been fixed as a side
> effect of commits to optimize large dictionary objects by sorting.

Many thanks for your reply, Tor.

I guess that there are different issues here.

Opening the file has been improved a lot:

$ time mupdf folien-ts.pdf 
real	0m1.841s
user	0m0.451s
sys	0m0.023s

But mupdfinfo needs too much with this file:

$ time mupdfinfo folien-ts.pdf 
real	12m48.983s
user	12m45.930s
sys	0m0.234s

pdfinfo and pdffonts are way faster:

$ time pdfinfo folien-ts.pdf 
real	0m0.415s
user	0m0.066s
sys	0m0.013s

$ time pdffonts folien-ts.pdf 
real	0m0.179s
user	0m0.166s
sys	0m0.010s

Do you really think that it isn't a good idea to take a look at this?

Many thanks for your help,


Pablo