Summary: | cannot read >2GB pdf file | ||
---|---|---|---|
Product: | MuPDF | Reporter: | Hin-Tak Leung <htl10> |
Component: | mupdf | Assignee: | MuPDF bugs <mupdf-bugs> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | CC: | bruce.edge, robin.watts, tor.andersson |
Priority: | P4 | ||
Version: | unspecified | ||
Hardware: | PC | ||
OS: | Linux | ||
Customer: | Word Size: | --- |
Description
Hin-Tak Leung
2011-12-23 01:00:09 UTC
I am sure somebody at Artifex has some big pdf's >2GB, but here is a recipe for creating one, on the typical linux/unix system: basically using the media-embedding feature of pdf to embed some big movies inside to push it over 2GB: https://bugs.freedesktop.org/show_bug.cgi?id=44085#c6 For the record, I spent some time looking into this recently. The 'easy' route to do this is to move from 32 to 64bit offset values within the code. This would allow us to access (effectively) unlimited size documents. The downsides to this are that standard file access functions can't be used any more (they operate on ints/longs), and that we bloat the memory usage as all objects have larger offsets in them. The 'hard' route would be to change to use unsigned offsets within the code; this would only get us from 2 to 4 Gig, and would cause significant pain in certain functions. I suspect if we do this, we'll pick the 'easy' route. But I can't see us doing this until we actually see such a file (or have a report of a customer/potential customer using such a file). Downgrading to enhancement. (In reply to comment #2) But I can't see us doing > this until we actually see such a file Granted it is rare, but I have such a file - an encyclopedia kind of document with figures, etc. Although I cannot share it (and would be also technically painful to do so, bandwidth/size-wise to copy), hence I looked into the LaTeX-based recipe to make one on a typical linux box, or whereas LaTeX runs. Note that ghostscript has 'gp_*64' functions to use the 64-bit functions when the platform supports it. These work on 32-bit builds on linux, mac os/x and windows. Doing something similar with mupdf builds would probably be fairly easy given that the platform specific functions are "known". I suggest an approach similar to gs -- that the code always calls the 64-bit functions, but these may be hooked to 32-bit with a wrapper that returns errors for unsupportable values if 64-bit is not supported on that platform. Any update on this bug? I'm running into a similar problem with gs in that it's also failing on >2BG files: %> ls -l /tmp/giant.pdf -rw-r--r-- 1 qa staff 2328769430 2012-09-11 17:27 /tmp/giant.pdf %> gs -dNOPAUSE -sDEVICE=jpeg -r144 -sOutputFile=giant-p%03d.jpg /tmp/giant.pdf GPL Ghostscript 9.06 (2012-08-08) Copyright (C) 2012 Artifex Software, Inc. All rights reserved. This software comes with NO WARRANTY: see the file PUBLIC for details. **** Error: Cannot find a 'startxref' anywhere in the file. **** Warning: An error occurred while reading an XREF table. **** The file has been damaged. This may have been caused **** by a problem while converting or transfering the file. **** Ghostscript will attempt to recover the data. Error: /rangecheck in --run-- Operand stack: post_eof_count -1966197866 Execution stack: %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1910 1 3 %oparray_pop 1909 1 3 %oparray_pop 1893 1 3 %oparray_pop --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- Dictionary stack: --dict:1169/1684(ro)(G)-- --dict:1/20(G)-- --dict:82/200(L)-- --dict:82/200(L)-- --dict:109/127(ro)(G)-- --dict:293/300(ro)(G)-- --dict:20/31(L)-- Current allocation mode is local GPL Ghostscript 9.06: Unrecoverable error, exit code 1 2GB isn't as big as it used to be. |