Bug 704742 - Provide capability to crop images to their frame to optimize PDF file size
Summary: Provide capability to crop images to their frame to optimize PDF file size
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: unspecified
Hardware: All All
: P4 enhancement
Assignee: Default assignee
Depends on:
Reported: 2021-11-30 15:43 UTC by Max
Modified: 2021-11-30 15:49 UTC (History)
0 users

See Also:
Word Size: ---


Note You need to log in before you can comment on or make changes to this bug.
Description Max 2021-11-30 15:43:28 UTC
There are often PDF files where images are embedded but cropped to a small part of the full image, so only a small part is visible in the PDF. This is a huge waste in terms of file size and it may also be a security issue if a part of the image was cropped out on purpose but can be extracted from the PDF.

Some DTP programs like Adobe InDesign allow the automatic cropping to image frames on PDF export to optimize the file size.

Scribus for example does not have this capability, and it would be preferable to have this functionality in GhostScript, so it can be applied to any PDF, even those for which the source format is not available.

The implementation may not be not trivial for cases where the frame is not rectangular or the image is inserted at an angle into a frame.

If the embedded images are jpeg compressed, there would either be re-compression happening or the cropping happens at compression block boundaries.

In any case, when a PDF has a huge, high-res jpg embedded of which only a tiny part is visible, the user now only has the option to reduce the resolution of the embedded images which will degrade the visual result significantly more than the added re-compression artifacts of a cropping operation.

See also:
Comment 1 Ken Sharp 2021-11-30 15:49:48 UTC
This isn't really feasible. There is no way to know, at the time the image is encoutnered, what portion of it will eventually be visible.

This problem is compounded by the PDF transparency model, we really can't tell whether the portion of the image lying under another object is obscured or contributes to the rendering of that object.

With a great deal of analysis I'm sure this is technically possible, but it would almost certainly require a two pass approach (at least) to identify the Z order of each object, and to determine whether the transparency in force at the time obscrued the underlying image or not.

I'm afraid this is far too much work for what I would have to say is a minority feature.