693675 – Different password encoding schemes for different security versions

Bug 693675 - Different password encoding schemes for different security versions

Summary: Different password encoding schemes for different security versions

Status:	RESOLVED FIXED

Alias:	None

Product:	MuPDF
Classification:	Unclassified
Component:	mupdf (show other bugs)
Version:	master
Hardware:	All All

Importance:	P4 normal
Assignee:	MuPDF bugs

URL:
Keywords:

Depends on:
Blocks:

Reported:	2013-03-01 13:05 UTC by Robin Watts
Modified:	2014-05-15 07:11 UTC (History)
CC List:	1 user (show)

See Also:
Customer:
Word Size:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Robin Watts 2013-03-01 13:05:02 UTC

In bug 693624, zeniko points out that:

> Adobe has actually managed to
> specify password encodings in the PDF 1.7 ExtensionLevel 3 spec: For crypt
> revisions 1 to 4, the password is in PdfDocEncoding (and not ISO-8859-1 as
> your code handles it) likely with unsupported characters just dropped, and
> for revisions 5 and 6 it's full UTF-8 (with prior SASLprep normalization and
> then truncated to a maximum of 127 bytes). The handling for these cases
> should then belong in pdf_crypt.c so that callers of
> fz_authenticate_password don't have to do the conversion themselves.

At the moment (or at least as soon as the commit in review now goes in) our command lines arrive at the processing phase in utf8. We convert these down to raw bytes (giving an error for any char outside the 0..255 range), and feed these into the authentication function.

Arguably we should always pass passwords in utf8 encoded, and let the authentication routine convert as appropriate.

Comment 1 Tor Andersson 2014-05-15 07:11:57 UTC

This was fixed in commit b302892a8c302aee9ca6d2abab2f32afbee3a8a5
Author: Tor Andersson <tor.andersson@artifex.com>
Date:   Mon Mar 4 13:56:54 2013 +0100

    Convert UTF-8 passwords to correct encoding.
    
    PDFDocEncoding for crypt revisions <= 4, UTF-8 for newer.