I ran a comparison of the output from Ghostscript head (r9862) on a variety of machines using 32 bit and 64 bit binaries on the nightly regression files. There were 1193 files in the test, each of which was run 15 times with different combinations of output format, resolution, and banding, for total of 17895 tests (the run time for these varied from ~40 minutes on my i7 to ~40 hours on my g5). I'll attach spreadsheets with the complete results and with the results from the x86_64 machines that we are planning to use for the cluster regressions (not including the MacPro which I don't have acccess to). For each output file the md5sum is calculated and this is available in the spreadsheet. In addition I generate a profile for each output file, summarizing the differences in md5sums. This profile is generated by sorting the list of machines in alphabetical order and assigning a different letter of the alphabet for each unique md5sum, for example, there were 4 machines, called M1, M2, M3, and M4, and all had the same md5sum for a particular the profile for that file would be 'aaaa'. If M1 and M3 had the same checksum as each other but M2 and M4 had unique checksums the profile would be 'abac'. Note that one or two files in the batch appear to be non-deterministic, generating a different checksum on each machine; I haven't removed these from the results but probably should do,
Created attachment 5235 [details] all.csv.gz Comparison of all 6 machines with 32-bit and 64-bit binaries. The machines are: amd64: AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ g5: PowerPC 970 (G5) @ 1.8GHz i7: Intel Core i7 CPU 920 @ 2.67GHz i7a: Intel Core CPU 965 @ 3.20GHz imac: Intel Core 2 Extreme X7800 @ 2.80GHz peeves: Intel Core 2 Extreme CPU X9770 @ 3.20GHz The order of the machines/binary word size: amd64_32 amd64_64 g5_32 g5_64 i7_32 i7_64 i7a_32 i7a_64 imac_32 imac_64 peeves_32 peeves_64 Here are the top profiles: aabbbbbbbbbb 9008 aaaaaaaaaaaa 5270 abccdcdcccdc 1239 abbbababbbab 625 abccdedeeede 385 aabbbbbbbbaa 252 abcccccccccc 251 aabbbbbbbbcc 99 abcdababbbab 76 abababababab 74 abccababbbab 49 aabbcccccccc 47 abcdcdcdcdcd 45 abcbababcbab 41 abbbcbcbbbcb 34 aabcbbbbbbbb 32 abcdededcded 31 aabbcbcbbbcb 31 abbbbbbbbbbb 29 abcccdcdddcd 25 abcdefefefef 24 abbcababbbab 22 abcdefefgfef 21 abccdcdcccab 20 aaaababaaaba 20 aabbaaaabbbb 15 abcdcececece 15 abccacacccac 12 abcdefefffef 12 abccdcdccceb 10 . . . Recall that aabbbbbbbbbb means that the amd64_32 and amd64_64 md5sums matched and all the other 10 md5sums matched. This the most common profile, occurring 9008 times out of the 17895 times. The third most common profile, abccdcdcccdc, which occurs 1239 times, indicates the following machine groupings: amd64_32 amd64_64 g5_32, g5_64, i7_64, i7a_64, imac_32, imac_64, peeves_64 i7_32, i7a_32, peeves_32
Created attachment 5236 [details] i7_and_peeves_64.csv Comparison of i7, i7a, and peeves in 64 bit mode. Profile frequencies: aaa 17463 aab 399 abb 22 abc 11
Created attachment 5237 [details] i7_and_peeves_32.csv Comparison of i7, i7a, and peeves in 32 bit mode. Profile frequencies: aaa 17478 aab 399 abc 12 aba 6
I thought I'd give one of these a go... I picked 30-11.PS because it had differences, other than that, the choice was random. Anyway, there is no rendering difference the cash column row 6 number is different because the test enumerates all the fonts on the machine (different on my two test machines), then prints out vmstatus results. In short we would expect different answers... I wonder how many of the postscript files are of this nature?
There are several of the CET files, named like ##-##.PS that do really funky stuff like printing out 'vmstatus' that will be VERY platform/implementation dependent. FWIW, I think that ALL of the CET files can be ignored for the purposes of of compiler/platform differences. IMHO, the CET is a really Adobe specific (and ancient) PS specific test suite that doesn't really serve Artifex which is why we stuck with just the FTS (names like ###-##.ps) until recently.
Regarding #5, the postscript group (yourself included) worked really hard to get the cet working well so I think something more surgical like revision 3211 that simply replaces print for show where we expect differences would be better.
Alex and I did go through a number of the indeterminacies some time back, and basically its more effort than its worth to try and 'fix' these files. There are a variety of issues including the file which draws a line, the length depending on the amount of 'usertime' spent processing the file, which isn't always the same even on the same machine....
Regarding #7, Alex, Ray and Ken are the most active with the postscript language so can the three of you decided what to do with the device dependent and non deterministic files? I thought just removing all the cet files extreme but I guess we could do that. Marcos does need a set of stable files to do the regression tests.
I thought Ralph had simply removed the most troublesome files from the regression run, and I'd vote in favour of doing that with any files we identify as inherently non-deterministic. We did (I think) consider modifying the files to work and giving them a slightly different name so we would know they weren't the real CET files. I don't think we ever went anywhere with that though. While the CET is very useful for testing all sorts of things, and many files can be configured not to print things like timings, I still think it more effort than its worth to try and fix the files. Willing to be out-voted though.
I agree with Ray's change to 30-11.PS but we need to start looking into stdout files. Comparing 32 and 64-bit architectures, the test shows that the latter has higher VM usage but halftone-related tests are opposite. This is an surprising result that should be reviewed for possible inefficiencies in memory allocation. I compare filtered CET stdout before and after a prospective fix. This saved my bacon several times. I vote to keep CET files in the test suite and fix individual files.
> I agree with Ray's change to 30-11.PS but we need to start looking > into stdout files. I can't seem to get any credit around here ;-). Anyway, it is rather difficult for us to move over to a heterogeneous cluster setup without "fixing" these. How long do you think it will take to look at these problems.
I'm fine with patching PS files. We can also have Masaki do some of these since he is also proficient in PS. IIRC, there are some that print addresses also that will be platform (and build) specific. The others that print various dict contents that don't sort first can all be addressed the same way. It makes our test suite different than the "real CET", but will definitely be more useful for regression runs. I, for one, don't think there is any advantage to us limiting ourselves to the "real" CET. BTW, it was Henry that patched the 30-11.ps file.
> IIRC, there are some that print addresses AFAIK all cases have been fixed. > The others that print various dict contents that don't sort > first can all be addressed the same way. All cases are sorted already. Some files still depend on the directory enumeration order, bur %rom% device should be deterministic and platform-independent.
We need to get similar reports for all languages.
Marcos will try a statically linked executable on the i7a to see if that resolves some of the ab* cases mentioned in comment 2 and 3. Also Marcos is to add tests of other languages
I tried a statically linked x86_64 gs r9906, built on my i7, on Alex's i7a and peeves. and all of the checksums agree except for 09_47N.PDF and 23-33.PS. The odd thing is the i7_dynamic, i7_static, i7a_dynamic, and i7a_static agree on 09_47N.PDF and peeves_dynamic and peeves_static agrees with itself, but the groups don't agree with each other. OTOH, for 23.3.PS i7_dynamic and i7_static agree with each other and i7a_dynamic agrees with peeves_dynamic and i7a_static agrees with peeves_static, but the three groups usually don't agree. I don't know why I'm explaining this in text, the results are easier to interpret: 09_47N.PDF.pbmraw.300.0,aaaabb 09_47N.PDF.pbmraw.300.1,aaaabb 09_47N.PDF.pbmraw.72.0,aaaabb 09_47N.PDF.pdf.pkmraw.300.0,aaaabb 09_47N.PDF.pdf.ppmraw.300.0,aaaabb 09_47N.PDF.pdf.ppmraw.72.0,aaaabb 09_47N.PDF.pgmraw.300.0,aaaabb 09_47N.PDF.pgmraw.300.1,aaaabb 09_47N.PDF.pgmraw.72.0,aaaabb 09_47N.PDF.pkmraw.300.0,aaaabb 09_47N.PDF.pkmraw.300.1,aaaabb 09_47N.PDF.pkmraw.72.0,aaaabb 09_47N.PDF.ppmraw.300.0,aaaabb 09_47N.PDF.ppmraw.300.1,aaaabb 09_47N.PDF.ppmraw.72.0,aaaabb 23-33.PS.pbmraw.300.0,aabcbc 23-33.PS.pbmraw.300.1,aabcbc 23-33.PS.pbmraw.72.0,aabbbb 23-33.PS.pgmraw.300.0,aabbbb 23-33.PS.pgmraw.300.1,aabbbb 23-33.PS.pgmraw.72.0,aabbbb 23-33.PS.pkmraw.300.0,aabcbc 23-33.PS.pkmraw.300.1,aabcbc 23-33.PS.pkmraw.72.0,aabbbb 23-33.PS.ppmraw.300.0,aababa 23-33.PS.ppmraw.300.1,aababa 23-33.PS.ppmraw.72.0,aabcbc The order is i7_dynamic.tab,i7_static.tab,i7a_dynamic.tab,i7a_static.tab,peeves_dynamic.tab,peeves_static.tab Unless anyone has an objection I'm going to go through all of the 'buggy' nightly regression files and make sure we have a bug report filed on them and then rename them to .disabled. As we figure out why they are either not deterministic or produce different results on different computers we can fix them and they can be re-enabled.
Created attachment 5282 [details] x86_64.csv After eliminating all of the nightly regression files that are known to be indeterministic here are the files that continue to show differences on the various machines (i7, i7a, macpro, and peeves). The good news is that only remaining differences are in psdcmyk output (which may be related to the other psdcmyk issue that already has a bug open) and three files: 09_47N.PDF, 23-33.PS, and 29-07C.PS. I'll attached rasters of these three, but I suspect we'll discover that the file are badly constructed.
BTW, I haven't been able to finish a 32 bit build comparison, due to bug 690638.
The complete raster files are large, so I've put them on casper in /home/support/690650. The command lines used to generate the files are: /home/marcos/artifex/nightly/gs/bin/gs \ -sOutputFile=/dev/shm/temp/23-33.PS.pbmraw.72.0 \ -dMaxBitmap=30000000 \ -sDEVICE=pbmraw \ -r72 \ -q \ -dNOPAUSE \ -dBATCH \ -K1000000 \ -dNOOUTERSAVE \ -dJOBSERVER \ -c false 0 startjob pop \ -f %rom%Resource/Init/gs_cet.ps \ - < /home/marcos/artifex/nightly/testfiles/23-33.PS /home/marcos/artifex/nightly/gs/bin/gs \ -sOutputFile=/dev/shm/temp/23-33.PS.pbmraw.300.0 \ -dMaxBitmap=30000000 \ -sDEVICE=pbmraw \ -r300 \ -q \ -dNOPAUSE \ -dBATCH \ -K1000000 \ -dNOOUTERSAVE \ -dJOBSERVER \ -c false 0 startjob pop \ -f %rom%Resource/Init/gs_cet.ps \ - < /home/marcos/artifex/nightly/testfiles/23-33.PS /home/marcos/artifex/nightly/gs/bin/gs \ -sOutputFile=/dev/shm/temp/23-33.PS.pbmraw.300.1 \ -dMaxBitmap=10000 \ -sDEVICE=pbmraw \ -r300 \ -q \ -dNOPAUSE \ -dBATCH \ -K1000000 \ -dNOOUTERSAVE \ -dJOBSERVER \ -c false 0 startjob pop \ -f %rom%Resource/Init/gs_cet.ps \ - < /home/marcos/artifex/nightly/testfiles/23-33.PS The PDF file is similar, except the -f %rom%Resource/Init/gs_cet.ps is replaced by -dFirstPage=1 - dLastPage=1 and the file is not redirected.
Created attachment 5283 [details] x86_64.csv Due to an error on my part the file 09_47N.PDF was included in the list of differences when it should not have been (the copy of the file on peeves was different than on the other machines). I've corrected this and have attached a revised .csv file.
A second screwup, I accidently deleted the raster files from peeves. This shouldn't be an issue since for the two files that are different, 23-33.PS and 29-07C.PS, the output from peeves matches one of the other machines for each output format (i.e. there aren't any aaab files).
Files 23-33.PS and 29-07C.PS and been disabled in r3227, please re-enable them when the regression issue has been resolved.
Reassigning to me for problem isolation.
23-33.PS prints some usertime values. Should be easy to disable them. (I'm on it) 29-07C.PS also prints usertime values. Adding to it, i7 is throwing 'undefined' error where it should be a 'configurationerror'. This file is a 'setpagedevice' test and those errors are on page 2 (/PolicyNotFound 0) and page 4 (/PolicyNotFound 2). Need more investigation on it.
I correct my previous comment #24. The numbers are not usertime, but are the checksums of all the text printed in a area. As for 23-33.PS, resoucestatus operator is returning different 'size' values for each machines and this is making different checksums. I opened a bug #691057 for this. If you want, I can cut problematic part out of 23-33.PS for temporary use. The difference in 29-07C.PS was caused by 'undefined' error on machine i7, as described in comment #24. I was able to reproduce it with on my Mac. Still looking for a reason.
This is a minimized test code for undefined error happening on 29-07C.PS so far. --- %! << /InputAttributes currentpagedevice /InputAttributes get dup dup length 1 sub undef >> setpagedevice (baa)= << /Policies << /PolicyNotFound 0 >> /PageSize [ 500 700 ] >> setpagedevice --- $ bin/gs -q -sDEVICE=nullpage spd.ps baa Error: /undefined in --setpagedevice-- ... Confirmed on Mac OS X and ubuntu. Windows XP also generates undefined error but instead of -- setpagedevice--, it was --get--.
I created a new bug #691065 for a 29-07C.PS case. Back to Marcos.
I'm closing this on the assumption that in the last two years enough of these issues have changed that this bug is no longer relevant. I'll run a new test in due course.