Bug 690579 - PDF file cannot be opened
Summary: PDF file cannot be opened
Status: RESOLVED DUPLICATE of bug 691060
Alias: None
Product: MuPDF
Classification: Unclassified
Component: mupdf (show other bugs)
Version: unspecified
Hardware: PC All
: P4 enhancement
Assignee: Tor Andersson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-06-26 22:33 UTC by Krzysztof Kowalczyk
Modified: 2010-05-21 01:42 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments
bug-543.pdf (177.80 KB, application/pdf)
2009-06-26 22:34 UTC, Krzysztof Kowalczyk
Details
patch-mupdf_pdf_open_c (1.32 KB, patch)
2009-07-10 08:03 UTC, Roberto Fernandez
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Krzysztof Kowalczyk 2009-06-26 22:33:31 UTC
This is from http://code.google.com/p/sumatrapdf/issues/detail?id=543

The attached PDF file cannot be opened. It appears to be mal-formatted but both
Adobe Reader and FoxIt manage to open it anyway.
Comment 1 Krzysztof Kowalczyk 2009-06-26 22:34:11 UTC
Created attachment 5169 [details]
bug-543.pdf

PDF file that cannot be opened.
Comment 2 Marcos H. Woehrmann 2009-06-29 03:54:49 UTC
With recent versions of Ghostscript (8.62 and later) this file is processed correctly if the appropriate CID 
font is installed and made available to Ghostscript.

Complete details can be found in doc/Use.htm#CIDFonts , but in summary for 8.64 do the following:

1.  Download the WadaMin-Regular font from http://www.artifex.com/AsianFonts/ and copy it to 
Resource/CIDFont/WadaMin-Regular (you will have to create the CIDFont directory).

2.  Modify the Resource/Init/cidfmap file by adding the following line at the end (note that the space 
before the ';' is required):

/Adobe-GB1 /WadaMin-Regular ;

Ghostscript will now substitute the WadaMin-Regular font for the missing font. 
Comment 3 Krzysztof Kowalczyk 2009-06-29 13:05:40 UTC
Which version of mupdf is used in Ghostscript 8.62 ? 

This is not what I see with latest mupdf sources. The problem is not caused by a
missing font but by mupdf not being able to read the xref of the PDF:

(gdb) break fz_throwimp
Breakpoint 1 at 0x11965: file fitz/base_error.c, line 16.
(gdb) run
Starting program: /Users/kkowalczyk/src/sumatrapdf/mupdf/obj-dbg/pdfbench
/Users/kkowalczyk/Downloads/bug-543.pdf
Reading symbols for shared libraries ++++. done
Starting: /Users/kkowalczyk/Downloads/bug-543.pdf

Breakpoint 1, fz_throwimp (func=0x4ec253 "readoldtrailer", file=0x4ec181
"mupdf/pdf_open.c", line=138, fmt=0x4ec314 "expected trailer marker") at
fitz/base_error.c:16
16          fprintf(stderr, "+ %s:%d: %s(): ", file, line, func);
(gdb) bt
#0  fz_throwimp (func=0x4ec253 "readoldtrailer", file=0x4ec181
"mupdf/pdf_open.c", line=138, fmt=0x4ec314 "expected trailer marker") at
fitz/base_error.c:16
#1  0x0004d8b8 in readoldtrailer (xref=0x600360, buf=0xbffef584 "n", cap=65536)
at mupdf/pdf_open.c:138
#2  0x0004dbd4 in readtrailer (xref=0x600360, buf=0xbffef584 "n", cap=65536) at
mupdf/pdf_open.c:185
#3  0x0004f64c in pdf_loadxref2 (xref=0x600360) at mupdf/pdf_open.c:671
#4  0x0004f98d in pdf_loadxref (xref=0x600360, filename=0xbffff7bc
"/Users/kkowalczyk/Downloads/bug-543.pdf") at mupdf/pdf_open.c:735
#5  0x00063cb5 in opensrc (filename=0xbffff7bc
"/Users/kkowalczyk/Downloads/bug-543.pdf", password=0x4d8172 "") at
apps/pdfbench.c:124
#6  0x000643f3 in benchfile (pdffilename=0xbffff7bc
"/Users/kkowalczyk/Downloads/bug-543.pdf") at apps/pdfbench.c:248
#7  0x000646b1 in main (argc=2, argv=0xbffff6c4) at apps/pdfbench.c:318

Continuing.
+ mupdf/pdf_open.c:138: readoldtrailer(): expected trailer marker
| mupdf/pdf_open.c:187: readtrailer(): cannot read trailer
| mupdf/pdf_open.c:674: pdf_loadxref2(): cannot read trailer
Warning: pdf_loadxref() failed, trying to repair

Breakpoint 1, fz_throwimp (func=0x4ed4f4 "parseobj", file=0x4ed4fd
"mupdf/pdf_repair.c", line=52, fmt=0x4ed529 "cannot repair encrypted files") at
fitz/base_error.c:16
16          fprintf(stderr, "+ %s:%d: %s(): ", file, line, func);
(gdb) bt
#0  fz_throwimp (func=0x4ed4f4 "parseobj", file=0x4ed4fd "mupdf/pdf_repair.c",
line=52, fmt=0x4ed529 "cannot repair encrypted files") at fitz/base_error.c:16
#1  0x00053f6d in parseobj (file=0x6003a0, buf=0xbffef54c "3", cap=65536,
stmofs=0xbffef53c, stmlen=0xbffef540, isroot=0xbffef548, isinfo=0xbffef544) at
mupdf/pdf_repair.c:52
#2  0x00054728 in pdf_repairxref2 (xref=0x600360, file=0x6003a0) at
mupdf/pdf_repair.c:201
#3  0x00054fc6 in pdf_repairxref (xref=0x600360, filename=0xbffff7bc
"/Users/kkowalczyk/Downloads/bug-543.pdf") at mupdf/pdf_repair.c:369
#4  0x00063cf5 in opensrc (filename=0xbffff7bc
"/Users/kkowalczyk/Downloads/bug-543.pdf", password=0x4d8172 "") at
apps/pdfbench.c:128
#5  0x000643f3 in benchfile (pdffilename=0xbffff7bc
"/Users/kkowalczyk/Downloads/bug-543.pdf") at apps/pdfbench.c:248
#6  0x000646b1 in main (argc=2, argv=0xbffff6c4) at apps/pdfbench.c:318
(gdb) c
Continuing.
+ mupdf/pdf_repair.c:52: parseobj(): cannot repair encrypted files
| mupdf/pdf_repair.c:204: pdf_repairxref2(): cannot parse object (15 0 R)
Error: pdf_repairxref() failed
Comment 4 Ray Johnston 2009-06-29 13:18:59 UTC
Sorry for the confusion. Marcos tested this with Ghostscript, not with mupdf.

This is assigned to Tor who has responsibility for mupdf. The fact that the
'repair' works with Ghostscript is just a hint for Tor in case he wants help
from Alex or other Artifex staff familiar with the Ghostscript repair facility.
Comment 5 Tor Andersson 2009-07-02 14:29:15 UTC
It's a known limitation of MuPDF that it doesn't recover the encryption dictionary 
when repairing broken files.
Comment 6 Roberto Fernandez 2009-07-10 08:03:46 UTC
Created attachment 5205 [details]
patch-mupdf_pdf_open_c
Comment 7 Roberto Fernandez 2009-07-10 08:09:11 UTC
the problem was:
...
xref 0 20
...
when it should be
...
xref
0 20
...

The patch attached do the trick for such broken pdfs.
Comment 8 zeniko 2009-08-01 21:00:59 UTC
This bug hasn't been properly fixed: The original document still fails to load in 
MuPDF.

While pdf_open.c does contain "broken pdfs where the section is not on a separate 
line" blocks, they are actually broken (they try to seek back at a point where the 
information about how much to seek has already been lost). Roberto's patch looks 
saner in this regard and actually fixes the issue.
Comment 9 Tor Andersson 2010-05-21 01:42:21 UTC
I'm going to let this case of broken PDFs fall back to the reparation mode, which should be made capable of repairing encrypted files.

*** This bug has been marked as a duplicate of bug 691060 ***