Bug 708497 - Wrong computation of fz_bound_page
Summary: Wrong computation of fz_bound_page
Status: RESOLVED FIXED
Alias: None
Product: MuPDF
Classification: Unclassified
Component: mupdf (show other bugs)
Version: unspecified
Hardware: All All
: P2 major
Assignee: MuPDF bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-04-23 10:08 UTC by Jorj
Modified: 2025-05-02 19:13 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jorj 2025-04-23 10:08:25 UTC
PyMuPDF issue https://github.com/pymupdf/PyMuPDF/issues/4466
Problem file https://github.com/user-attachments/files/19865143/page0.pdf
(1 page)

The page object is defined so:
<<
  /Type /Page
  /Contents [ 16 0 R 17 0 R 18 0 R 19 0 R 26 0 R 27 0 R 49 0 R
      50 0 R ]
  /CropBox [ -21 -21 616.276 862.89 ]  # <== this is wrong: larger than MediaBox!
  /Group 61 0 R
  /MediaBox [ 0 0 595.28 841.89 ]
  /Parent 1 0 R
  /Resources <<
    /ColorSpace 5 0 R
    /ExtGState 6 0 R
    /Font 7 0 R
    /Pattern 8 0 R
    /ProcSet [ /PDF /ImageC /Text ]
    /Shading 9 0 R
    /XObject 10 0 R
  >>
  /Rotate 0
>>

When executing fz_bound_page, the result is x0=21, y0=21, x1=616.28, y1=862.89
which is wrong.
In cases like these, the MediaBox values should be returned instead.
Comment 1 Tor Andersson 2025-04-23 13:39:45 UTC
The coordinates are funky because of the interaction between the bad values in the file, and backwards compatibility logic in MuPDF's Fitz coordinate space computations.

When calculating the page bounds we intersect the requested box (in this case CropBox) with the MediaBox, so that the coordinates stay within the MediaBox.

The origin of the page coordinate space is based on the original CropBox coordinates.

The box returned for this file is the intersection in PDF space of the CropBox [ -21 -21 616 862 ] and the MediaBox [ 0 0 595 841 ]. The top left of the intersected box is [ 0 842 ] in PDF space. This is then transformed to put the CropBox original top left coordinate [ -21 862 ] at [ 0 0 ] in the Fitz space (so +21 +21 in both x and y).

Code crashing when they encounter unexpected coordinates is not our problem, but we could discuss changing the definition of the Fitz coordinate space to use as origin point the intersection of the CropBox and MediaBox rather than the raw CropBox.
Comment 2 Sebastian Rasmussen 2025-05-02 19:13:57 UTC
Fixed in

commit 80645b4b00f7d5df25f8cb40385d37c5b95b2b46
Author: Tor Andersson <tor.andersson@artifex.com>
Date:   Wed Apr 23 16:14:04 2025 +0200

    Bug 708497: Compute Fitz space origin from adjusted CropBox.
    
    Use the intersection of the CropBox and MediaBox instead of the raw
    CropBox when deciding the origin of the Fitz coordinate space.
    
    This way FZ_PAGE_BOX is always [0, 0, w, h] even for files where the
    CropBox is out of bounds.