Bug 705994

Summary: Strange two-page layout for certain PDF thumbnails
Product: Ghostscript Reporter: Alan Orth <bugs.ghostscript.com>
Component: PDF InterpreterAssignee: Default assignee <ghostpdl-bugs>
Status: RESOLVED INVALID    
Severity: minor    
Priority: P4    
Version: 10.0.0   
Hardware: PC   
OS: Linux   
Customer: Word Size: ---
Attachments: PDF document
Thumbnail
libvips thumbnail

Description Alan Orth 2022-10-19 06:47:04 UTC
Created attachment 23325 [details]
PDF document

I'm not sure how to describe this bug. I have unexpected results from Ghostscript on certain thumbnails where the resulting image is somehow in a two-page portrait layout, with a blank page on the left side. I am using Ghostscript via ImageMagick, but this command reproduces it:

$ gs -sDEVICE=jpeg -dPDFFitPage=true -dDEVICEWIDTHPOINTS=640 -dDEVICEHEIGHTPOINTS=640 -sPageList=1 -sOutputFile=10568-116598.pdf.jpg 10568-116598.pdf

I have noticed this on happening on a handful of PDFs over the years with different versions of Ghostscript (currently version 10.0.0), but haven't yet figured out a pattern. If need be I can find more PDFs to inspect.

Thank you!
Comment 1 Alan Orth 2022-10-19 07:01:04 UTC
Created attachment 23326 [details]
Thumbnail

Thumbnail generated with gs. Notice there is essentially a blank white page in portrait layout to the left of the correct thumbnail of the first page of the PDF.
Comment 2 Alan Orth 2022-10-19 07:03:29 UTC
Created attachment 23327 [details]
libvips thumbnail

libvips creates a more reasonable thumbnail here.
Comment 3 Ken Sharp 2022-10-19 07:27:42 UTC
PDF files can have multiple different 'Box' values; ArtBox, BleedBox, CropBox,  MediaBox and TrimBox. The MediaBox is required the other boxes are optional, a given PDF page description must contain the MediaBox and may contain any or all of the others.

By default Ghostscript uses the MediaBox to determine the size of the media. Other PDF consumers may exhibit other behaviours.

The pages in your PDF file contain all of the Boxes. In the majority of cases the Boxes all contain the same values (which makes their inclusion pointless of course). But for page 1 they differ:

/CropBox[594.375 0.0 1190.55 839.176]
/MediaBox[0.0 0.0 1190.55 841.89]

You can tell Ghostscript to use a different Box value for the media by using one of -dUseArtBox, -dUseBleedBox, -dUseCropBox, -dUseTrim,Box. If I specify -dUseCropBox then the file is rendered as you expect.
Comment 4 Alan Orth 2022-10-19 08:32:43 UTC
Thank you, Ken! That is a perfect explanation. Just a note for future travelers that you can select the CropBox in ImageMagick using:

-define pdf:use-cropbox=true