702096 – opj_tcd_decode_tile() allocates wastefully compared to luratech.

Bug 702096 - opj_tcd_decode_tile() allocates wastefully compared to luratech.

Summary: opj_tcd_decode_tile() allocates wastefully compared to luratech.

Status:	CONFIRMED

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	JPX/JBIG2 encode/decode (show other bugs)
Version:	master
Hardware:	All All

Importance:	P2 enhancement
Assignee:	Sebastian Rasmussen

URL:
Keywords:

Duplicates (5):	702091 702092 702093 702094 702095 (view as bug list)
Depends on:
Blocks:

Reported:	2020-02-05 17:59 UTC by Julian Smith
Modified:	2020-06-03 17:13 UTC (History)
CC List:	1 user (show)

See Also:
Customer:
Word Size:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Julian Smith 2020-02-05 17:59:56 UTC

Memory use of this command is much larger for normal builds than luratech builds:

    gs -sOutputFile=/dev/null -dBATCH -dNOPAUSE -r300 -sDEVICE=ppmraw ../tests_private/comparefiles/Bug694621.pdf

In a normal build, this has maximum heap usage (from valgrind --tool=massif) of 150MB, of which 109,511,619B is from opj_tcd_decode_tile() where it allocates a UINT32 for each pixel.

Luratech's maximum heap usage for the same input file is 40MB. It looks like luratech's equivalent to opj_tcd_decode_tile() is s_jpxd_process() which allocates 27,377,904B, almost exactly a quarter of normal build.

Luratech's code does not default to 32 bits per pixel. Instead it appears to explicitly look at the colorspace and allocate accordingly. In the above case, we end up with 8 bits per pixel, hence reducing memory use by a factor of 4. The callback s_jpxd_write_data() uses colorspace info when generating the output data.

So it might be worth improving opj_tcd_decode_tile() to use a similar technique to luratech, in order to avoid the excessive memory use for this sort of input file.

Comment 1 Julian Smith 2020-02-07 18:07:49 UTC

*** Bug 702095 has been marked as a duplicate of this bug. ***

Comment 2 Julian Smith 2020-02-07 18:08:03 UTC

*** Bug 702094 has been marked as a duplicate of this bug. ***

Comment 3 Julian Smith 2020-02-07 18:08:27 UTC

*** Bug 702093 has been marked as a duplicate of this bug. ***

Comment 4 Julian Smith 2020-02-07 18:08:37 UTC

*** Bug 702092 has been marked as a duplicate of this bug. ***

Comment 5 Julian Smith 2020-02-07 18:08:50 UTC

*** Bug 702091 has been marked as a duplicate of this bug. ***

Comment 6 Ray Johnston 2020-05-15 02:25:11 UTC

Assigning to Henry to determine who should do this (and when), possibly after
discussion at an engineering meeting.

Comment 7 Henry Stiles 2020-05-19 01:28:52 UTC

Why not request an enhancement upstream?

Comment 8 Sebastian Rasmussen 2020-05-19 20:53:38 UTC

For us to do this optimization would be a fair bit of work. I have reported this upstream, let's see what they say:

https://github.com/uclouvain/openjpeg/issues/1252

Comment 9 Sebastian Rasmussen 2020-06-03 17:13:48 UTC

> let's see what they say:

Their immediate response was as follows:

"Intermediate computation need more than 8 bits. Probably 16 bits is enough, but some research in JPEG2000 reference literature should be done to confirm it. And OpenJPEG supports also 16-bit images, which would thus needs 32 bits. Not to mention irreversible which use float32 intermediate computations. So we'd need to have two code paths, or use some templating code (switching to C++) to avoid hugly macros. That would be a really non trivial effort !"

Julian remarked that it is surprising that they don't think more than 32bpp intermediate results are necessary for 32bpp images.