GS needs to be udated to use openjpeg2 to include the fixes that have been added to openjpeg2 in mupdf. I hope we will not lose fixes from the old openjpeg by updating, a review of the logs should be done to verify. This will likely end up being a multiple bounty job which we'll negotiate as we go along. Assigning directly to Shelly who will collect the bounty. This isn't an "open" problem.
*** Bug 694306 has been marked as a duplicate of this bug. ***
Created attachment 10009 [details] eYCC colour space patch The latest OpenJPEG2 code is now committed on the OpenJPEG2 branch of user/shailesh/ghostpdl.git. There are still a few tasks to complete this work :- 1) Look into all the regressions highlighted by the cluster push 2) Find all patches added to MuPDF for the OpenJPEG V1.5 to V2.0 upgrade Note : OpenJPEG2 does not currently support eYCC colour spaces but it does decode them correctly and the attached patch will enable this feature now if we want to add it.
The test files in 689362 and 691537 are failing in the GS/OpenJPEG2 regression but work fine on the mupdf/OpenJPEG2 branch. Quick analysis shows that the failure is smask related. In mupdf the jpx stream is decoded and the smask is applied but in GS it seems to go through the jpx stream twice and fails to apply the smask. This is almost as if it is looking for the wrong object and retrieving the jpx stream again instead of fetching the smask. Any thoughts anyone?
Regression testing of the latest OpenJPEG2 branch shows that fts_17_1716 fails to decode correctly and Altona_Technical_v20_x4.pdf still has an error on page 13. Both of these errors are visible in mupdf and would be suitable for Simon to look into.
(In reply to comment #4) > Regression testing of the latest OpenJPEG2 branch shows that fts_17_1716 > fails to decode correctly and Altona_Technical_v20_x4.pdf still has an error > on page 13. > > Both of these errors are visible in mupdf and would be suitable for Simon to > look into. Just to clarify, neither of these are 'regressions' as such. The fts_17_1716.pdf file looks identical with both 1.5 and 2.0. Altona_Technical_v20_x4.pdf decodes with an error on 1.5, but dies with an 'assertion' (even in release modes) on 2.0. These would indeed be good candidates for Simon, I think.
I've included one of Shelly's fixes into my openjpeg patch queue at http://git.ghostscript.com/?p=user/zeniko/openjpeg.git;a=shortlog;h=refs/heads/zeniko_fixes . All of these fixes are either for outstanding fuzzing issues or cause progressions. Shelly: Are you sure that your other fix (for j2kp4-file3-ycc-8bpc.pdf) doesn't undo part of the original bug fix (so that a malformed document couldn't just set mct to 0 and skip the required sanity check)? (In reply to comment #5) > Altona_Technical_v20_x4.pdf decodes with an error on 1.5, but dies with an > 'assertion' (even in release modes) on 2.0. Not sure where the assertion is from (I assume it happens with Ghostscript since it doesn't with MuPDF), but http://git.ghostscript.com/?p=user/zeniko/openjpeg.git;a=commitdiff;h=05d48b81f53328ef5c8fc4a2f1d4df22cf163a98 should fix the missing tile.
(In reply to comment #6) Zeniko: The changes were made for a few reasons, firstly the original logic was (A && B || C) which is compiler specific so I reordered this to be (A && (B || C)). Secondly, the tiles only have to have the same dimensions when mct is 1, otherwise the tiles can have different dimensions and still be decoded correctly.
(In reply to comment #7) > Zeniko: The changes were made for a few reasons, firstly the original logic > was (A && B || C) which is compiler specific so I reordered this to be (A && > (B || C)). That's not even compiler specific but really just a (harmless) typo in my code. Thanks for spotting it. > Secondly, the tiles only have to have the same dimensions when > mct is 1, otherwise the tiles can have different dimensions and still be > decoded correctly. If I break 1336.pdf.asan.47.376 further so that the value for mct is read as 127 instead of 1, I get a crashes with your change. If there are indeed cases where my check is too strict, they have to be detected more reliably in order not to regress the fix.
Zeniko: I agree that the patch definitely needs another look. We need to ensure that it works correctly with j2kp4-file3-ycc-8bpc.pdf and does not crash on 1336.pdf.asan.47.376 when mct is 1 or 127.
The integration of OpenJPEG2 into GS has been stalled for a while due to the lack of information about the alpha channel in the stream data. There was a patch in the original code to store the alpha channel but this was never pushed back upstream. I have been discussing this in the forums and have been told that the alpha channel is usually the last component in the image. However this is not documented anywhere and cannot be officially confirmed. A better solution would be to avoid assuming anything and to add a new API call to extract the channel types. This must be done in such a way to leave the ABI intact and hence have a better chance of the patch being accepted upstream. At this point in the development the remaining problem files (some FTS and Altona_Technical_v20_x4.pdf) appear to be alpha channel related. It should be noted that given the last influx of files into the regression cluster it has become very difficult to detect real problem files due to the large number of false positives in bmpcmp which limits itself to only showing the first 1000 tests.
Shailesh: You can use the -filter options on the cluster to restrict the number of results you get. git cluster gs -filter=pgmraw.200.0 -filter=ppmraw.200.0 or clusterpush.pl gs -filter=PDF_1.7_FTS will restrict the tests in the obvious ways. This might help get the numbers down?
(In reply to comment #9) > Zeniko: I agree that the patch definitely needs another look. We need to > ensure that it works correctly with j2kp4-file3-ycc-8bpc.pdf and does not > crash on 1336.pdf.asan.47.376 when mct is 1 or 127. ... or apparently any other value than 0 (and maybe 2). What about http://git.ghostscript.com/?p=user/zeniko/openjpeg.git;a=commitdiff;h=258b6a3d8dd98538cebbe5c53ecfbc51cf8ed3e2;hp=2d56139cb256a99fcc2165229da2ed56040c0666 ? That doesn't crash while preventing the early bail-out for j2kp4-*.pdf. BTW: MuPDF later checks itself for consistent dimensions and rejects the image after all so that the page remains blank. I assume that Ghostscript somehow manages to handle that case more gracefully.
Zeniko: The updated patch works great with the listed test files and with varying mct values. I will merge this into GS once I have a clean regression run. Please note, all of the j2kp4-*.pdf files decode correctly on the GS branch and produce a valid image.
Created attachment 10129 [details] List of all OpenJPEG test files The current cluster run produces a lot of noise unrelated to the OpenJPEG2 code. So I made a build that seg faults only in the OpenJPEG code and the attached file lists all the tests identified by this build.
The final GS patch for the alpha channel has been submitted on the OpenJPEG2 branch. All files decode correctly but there is quite a bit of noise in the bmpcmp so it is not possible to get a clear run. Is it possible to get this work reviewed and cleared off so that I can use this as a baseline for the fuzzing bugs? A number of the fuzzing files do not seg fault using this branch code so I would prefer to close off this work first.
Hello Shelly, please go ahead and merge or rebase your openjpeg2 branch into master. Open bugs for all regressions, assign them to me, and close off this report. I will probably make the regressions bountiable as well. Also it would be useful if you could email any correspondence you had with the openjpeg group to tech@artifex.com that might be a good future reference for us.
Henry, I have forwarded the latest OpenJPEG email to the tech address as requested. All the files listed in comment #14 have been tested using the filter option in clusterpush and there are no regressions. With regards to merging back from the branch, I have never tried this before and could use some assistance. Do you have some specific instructions to follow or someone I could liaise with while doing this?
(In reply to comment #17) > Henry, I have forwarded the latest OpenJPEG email to the tech address as > requested. > I didn't see it. I'll check with other folks during business hours tomorrow. > All the files listed in comment #14 have been tested using the filter option > in clusterpush and there are no regressions. > > With regards to merging back from the branch, I have never tried this before > and could use some assistance. Do you have some specific instructions to > follow or someone I could liaise with while doing this? It looks like you have a topic branch called OpenJPEG2, so: git rebase master openJPEG2 fix up conflicts manually in files with git add and commit, and use rebase --continue, follow git's instructions. then git push everything just like a normal commit. I assume there will be conflict because your merge of master into you local branch had conflicts you did not resolve. Robin or Chris can help more if you run into problems. It looks like you haven't configured your system since the merge because configure.ac is conflicted according the logs.
(In reply to comment #18) > (In reply to comment #17) > > Henry, I have forwarded the latest OpenJPEG email to the tech address as > > requested. > > > > I didn't see it. I'll check with other folks during business hours tomorrow. I think only Artifex employees can post to tech@artifex.com
(In reply to comment #19) > (In reply to comment #18) > > (In reply to comment #17) > > > Henry, I have forwarded the latest OpenJPEG email to the tech address as > > > requested. > > > > > > > I didn't see it. I'll check with other folks during business hours tomorrow. > > > I think only Artifex employees can post to tech@artifex.com Oh maybe so, an internal alias? support@artifex.com is probably a better place to send it. Anyway, I requested this because I thought there was push back from the openjpeg group about integrating changes and they wanted us to do more than just simple bug fixing - api changes and such. Perhaps they had that discussion with Zeniko. I was hoping to collect that correspondence and use it as context for a discussion with them, but the email I've seen doesn't seem to contain anything I was expecting.
(In reply to comment #20) > Perhaps they had that discussion with Zeniko. I've only had a single e-mail exchange with Mathieu Malaterre WRT https://code.google.com/p/openjpeg/issues/detail?id=225 in which I offered to share confidential testcases and eventually was told where to upload them. That was back in June; nothing has happened since. BTW: There are some fixes on http://git.ghostscript.com/?p=user/zeniko/openjpeg.git;a=shortlog;h=refs/heads/zeniko_fixes which I'd recommend merging into ghostpdl's openjpeg as well.
Is all of this bug addressed by 66d9c0aa17a5abcecd6590e63f0620f7aa51634c or are there residual issues? Shelly?
Henry, all OpenJPEG files in the regression test pass through correctly and there are no residual issues. There is an ongoing task to ensure that all fuzzing updates in the OpenJPEG code is merged between GS and mupdf, as well as sent upstream.
Marking as fixed see comments 22 and 23.