the attached patch adds support for deduplicating identical streams it does not decompress streams, instead the two objects have to be exactly the same (same filter, same length), before the compressed streams are compared. if the streams are identical, remove one of them please let me know what you think about this enhancement. thanks
Created attachment 9203 [details] deduplicate identical streams
The patch as supplied does not work as it's incorrectly dealing with fz_buffers and memcmping the incorrect length. Also, there are potential problems in the error handling. I have a fix based on this same idea going through review now though, and will update the bug here when it is committed. http://git.ghostscript.com/?p=user/robin/mupdf.git;a=commitdiff;h=6bc1ca3cfc19440b99c2efc919c2ec607fa51666 Many thanks!
Fixed in: commit e145b71a5a7462660e210d40ada498e01c7407a3 Author: Robin Watts <robin.watts@artifex.com> Date: Fri Jan 11 16:18:05 2013 +0000 Bug 693545: Extend pdfwrite to remove identical streams. When writing pdf files, we currently have the option to remove duplicate copies of objects; all streams are treated as being different though. Here we add the option to spot duplicate streams too. Based on a patch submitted by Heng Liu. Many thanks! Many thanks!
thanks for landing this should pdfclean be modified to add something like the following to its usage message? "\t-gggg\tin addition to -ggg merge duplicate objects with streams\n"
(In reply to comment #3) Shouldn't it be > if (lena == lenb && memcmp(dataa, datab, lena) == 0) > differ = 0; i.e. the two lengths must match, else with lena > lenb this will cause a read access violation (and with lena < lenb this might consider streams identical which aren't)?
Fixed in: commit 7231417c1e4cf1c8a5601a54a24e6366bee3a8c9 Author: Robin Watts <robin.watts@artifex.com> Date: Sat Jan 12 11:49:25 2013 +0000 Bug 693545: Fix typo in previous commit. When adding code to spot identical streams, I got the logic in a test reversed as a result of a last minute change. Corrected here. Thanks to zeniko for pointing this out.