691860 – pdf_repair ignores encryption for documents using PDF 1.5's new xref format

Bug 691860 - pdf_repair ignores encryption for documents using PDF 1.5's new xref format

Summary: pdf_repair ignores encryption for documents using PDF 1.5's new xref format

Status:	RESOLVED FIXED

Alias:	None

Product:	MuPDF
Classification:	Unclassified
Component:	mupdf (show other bugs)
Version:	unspecified
Hardware:	PC Windows 7

Importance:	P4 normal
Assignee:	Tor Andersson

URL:	http://code.google.com/p/sumatrapdf/i...
Keywords:

Depends on:
Blocks:

Reported:	2011-01-01 18:25 UTC by zeniko
Modified:	2011-01-06 19:15 UTC (History)
CC List:	1 user (show)

See Also:
Customer:
Word Size:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description zeniko 2011-01-01 18:25:45 UTC

E.g. http://www.papersnake.de/millimeterpapier/millimeterpapier_grau.pdf used to load but now fail to open with:

+ mupdf\mupdf\pdf_xref.c:513: pdf_loadxref(): first object in xref is not free
\ mupdf\mupdf\pdf_xref.c:550: pdf_openxrefwithstream(): trying to repair
+ mupdf\fitz\filt_flate.c:59: readflated(): zlib error: incorrect header check
| mupdf\fitz\stm_read.c:29: fz_read(): read error
| mupdf\fitz\stm_read.c:84: fz_readall(): read error
| mupdf\mupdf\pdf_stream.c:391: pdf_loadstream(): cannot read raw stream (2 0 R)
| mupdf\mupdf\pdf_page.c:58: pdf_loadpagecontents(): cannot load content stream (2 0 R)
| mupdf\mupdf\pdf_page.c:223: pdf_loadpage(): cannot load page contents (2 0 R)

The code change that broke this for us was making pdf_loadxref fail instead of warn for invalid object references.

Comment 1 zeniko 2011-01-01 19:54:27 UTC

Looks like pdf_repair just ignores the Encrypt dict that would be needed to load the document.

Comment 2 Sebastian Rasmussen 2011-01-01 23:11:41 UTC

Actually pdf_repair catches the Encrypt dict and introduces it into the repaired trailer dictionary. But still arc4 is not used when decoding stream data.

Comment 3 Sebastian Rasmussen 2011-01-01 23:19:15 UTC

Aha, got it! The reason is the fact that we re-build the trailer dictionary. Since /Encrypt is usually an indirect reference it contains a pointer to the xref object . This is fine under normal circumstances, but at this occasion we create a _new_ xref object, hence all points ought to be updated...

Comment 4 Sebastian Rasmussen 2011-01-01 23:46:25 UTC

The previous comment might not have made sense.

When repairing .pdfs the file is scanned through, locating all objects. While this takes place xref is nil so as not to dereference any indirect references. As of lately the /Encrypt dictionary object reference is grabbed while scanning, thus its xref pointer is nil. As a next step the reference is inserted into the newly re-build xref trailer. After the repair has completed the /Encrypt entry in the trailer is checked to see if the file is encrypted. This is done by calling fz_isdict() on the entry which in turn calls fz_resolveindirect() wherein the indirect object's xref pointer is checked. If the xref pointer is nil then fz_resolveindirect() returns nil whereby fz_isdict() claims it not to be a dictionary leading to the decision to consider the file as non-encrypted.

The reason MuPDF has been able to avoid this in the past is because when pdfs were encountered, having issues such as the first xref object not being free (or objects numbers being one higher than the xref claims), the xref was simply resized without running pdf_repairxref() at all. By doing this the trailer was kept intact and its /Encrypt reference contained valid pointers to the xref.

I have a proposed fix, and it's awaiting Tors approval.