New bug created from 695190 Comment 11 norbert.janssen@oce.com I added 3 testfiles (05_PCL, 13_PCL, cicero) and a profiledata.zip to peeves. These files still are a bit slower on 914 with respect to 906 (profiling shows that this is in the rastergrafix area). I don't know if this can be improved much).
I can't reproduce significant timing differences in any of these files on linux. If you want us to look further, please report timings and profiles for each release. I think you only sent profiles for the 914.
Created attachment 10865 [details] 906 profiledata
Created attachment 10866 [details] golden_master profiledata uploaded the 906 and git-trunk (1 may 2014) profiling data for 05_PCL5.pcl 13_PCL6.xl cicero.xl testfiles. There is indeed not much difference between 906 and git-trunk.
Created attachment 10867 [details] LeadingEdge=3 profile for cicero
Created attachment 10868 [details] LeadingEdge=3 profile for cicero I noticed however that there was a difference in time (28.9s for 906, 36.3s for git-trunk) when -dLeadingEdge=3 is used.
Created attachment 10869 [details] git LeadingEdge=3 profile for cicero
Created attachment 10870 [details] 906 LeadingEdge=3 profile for cicero
Henry has bisected the problem and has discovered that it was introduced by a pair of commits: http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=7b3a65aab20feac334cac8e5935ba5cbe310ac69 http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=b81962611a292e1b2c5306f3d5cdfea832715169 These were put in by me, and are required for correct operation of gs. The device functions within ghostscript make an assumption that all bitmaps/pixmaps they are passed have a 'raster' of a given multiple. This exact raster multiple depends on the host architecture, and is set to the smallest value at which the processor can access memory at its highest speed. (So on a 32bit processor, it's generally set to 4 bytes so that we can do 'int' based copying operations for speed). The SSE thresholding code was breaking this rule and was sending data with a raster of 2 bytes. For many platforms this did not matter, but it is not safe in general. (It is even possible that there may be some devices where it matters on more platforms, though this is unlikely). For safety, therefore, we have updating the LAND_BITS to always be at least as large as the align_bitmap_mod (in bits). This means that for landscape thresholding we will be working in larger strips, and this may have an adverse effect on caching. Norbert, you can probably override this code by setting LAND_BITS back to 16 for your target builds, but in general, we have to err on the side of caution.