PyMuPDF issue https://github.com/pymupdf/PyMuPDF/issues/4466 Problem file https://github.com/user-attachments/files/19865143/page0.pdf (1 page) The page object is defined so: << /Type /Page /Contents [ 16 0 R 17 0 R 18 0 R 19 0 R 26 0 R 27 0 R 49 0 R 50 0 R ] /CropBox [ -21 -21 616.276 862.89 ] # <== this is wrong: larger than MediaBox! /Group 61 0 R /MediaBox [ 0 0 595.28 841.89 ] /Parent 1 0 R /Resources << /ColorSpace 5 0 R /ExtGState 6 0 R /Font 7 0 R /Pattern 8 0 R /ProcSet [ /PDF /ImageC /Text ] /Shading 9 0 R /XObject 10 0 R >> /Rotate 0 >> When executing fz_bound_page, the result is x0=21, y0=21, x1=616.28, y1=862.89 which is wrong. In cases like these, the MediaBox values should be returned instead.
The coordinates are funky because of the interaction between the bad values in the file, and backwards compatibility logic in MuPDF's Fitz coordinate space computations. When calculating the page bounds we intersect the requested box (in this case CropBox) with the MediaBox, so that the coordinates stay within the MediaBox. The origin of the page coordinate space is based on the original CropBox coordinates. The box returned for this file is the intersection in PDF space of the CropBox [ -21 -21 616 862 ] and the MediaBox [ 0 0 595 841 ]. The top left of the intersected box is [ 0 842 ] in PDF space. This is then transformed to put the CropBox original top left coordinate [ -21 862 ] at [ 0 0 ] in the Fitz space (so +21 +21 in both x and y). Code crashing when they encounter unexpected coordinates is not our problem, but we could discuss changing the definition of the Fitz coordinate space to use as origin point the intersection of the CropBox and MediaBox rather than the raw CropBox.
Fixed in commit 80645b4b00f7d5df25f8cb40385d37c5b95b2b46 Author: Tor Andersson <tor.andersson@artifex.com> Date: Wed Apr 23 16:14:04 2025 +0200 Bug 708497: Compute Fitz space origin from adjusted CropBox. Use the intersection of the CropBox and MediaBox instead of the raw CropBox when deciding the origin of the Fitz coordinate space. This way FZ_PAGE_BOX is always [0, 0, w, h] even for files where the CropBox is out of bounds.