Bug 705640 - Fails to parse epubs
Summary: Fails to parse epubs
Status: RESOLVED FIXED
Alias: None
Product: MuPDF
Classification: Unclassified
Component: epub (show other bugs)
Version: 1.19.0
Hardware: PC Windows 7
: P4 normal
Assignee: MuPDF bugs
URL:
Keywords:
: 697619 (view as bug list)
Depends on:
Blocks:
 
Reported: 2022-07-06 08:29 UTC by mrx23
Modified: 2023-05-09 10:56 UTC (History)
2 users (show)

See Also:
Customer:
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description mrx23 2022-07-06 08:29:28 UTC
I try to extract text from these epubs, all of them fail.

zipped: https://drive.google.com/file/d/1AtZbc9V7E0IVDDBWw-Eo7DQzQbV9QQA9/view?usp=sharing

(please ignore mobi file)

But files itself open correctly in SumatraPDF reader.
Lib could parse other epubs, just not these.
Seems to me this lib is to strict to the standard or something.
Comment 1 mrx23 2022-07-06 09:01:12 UTC
Oh yeah "Tor.." named also worked with the lib.

I renamed "Pandora.." and "The Bear" to .zip and could easily read xml text by hand.
So if I can do it by hand why can't the lib?
Comment 3 Tor Andersson 2023-05-09 10:54:11 UTC
commit b9c7145c8d3f972b2880dfa799f1d1129b802dde
Author: Tor Andersson <tor.andersson@artifex.com>
Date:   Mon May 8 17:46:47 2023 +0200

    Bug 705640: Don't always fail early on encrypted EPUB files.
    
    Check individual parts of the file for encryption by parsing out the
    encryption.xml file and storing the CipherReference URIs in a list.
    
    For now, we just throw an error whenever we try to open an encrypted
    entry in the EPUB. Eventually this can be extended to support decrypting
    the data, given access to the appropriate DRM keys.
Comment 4 Tor Andersson 2023-05-09 10:56:12 UTC
*** Bug 697619 has been marked as a duplicate of this bug. ***