Bug 695195 - possible performance regression in raster graphics
Summary: possible performance regression in raster graphics
Status: RESOLVED WONTFIX
Alias: None
Product: GhostPCL
Classification: Unclassified
Component: PCL raster (show other bugs)
Version: unspecified
Hardware: PC All
: P4 enhancement
Assignee: Henry Stiles
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-30 06:20 UTC by Henry Stiles
Modified: 2014-05-02 10:54 UTC (History)
2 users (show)

See Also:
Customer:
Word Size: ---


Attachments
906 profiledata (138.09 KB, application/x-zip-compressed)
2014-05-01 01:34 UTC, norbert.janssen
Details
golden_master profiledata (136.12 KB, application/x-zip-compressed)
2014-05-01 02:20 UTC, norbert.janssen
Details
LeadingEdge=3 profile for cicero (41.26 KB, application/x-zip-compressed)
2014-05-01 02:28 UTC, norbert.janssen
Details
LeadingEdge=3 profile for cicero (43.81 KB, application/x-zip-compressed)
2014-05-01 02:30 UTC, norbert.janssen
Details
git LeadingEdge=3 profile for cicero (43.81 KB, application/x-zip-compressed)
2014-05-01 02:32 UTC, norbert.janssen
Details
906 LeadingEdge=3 profile for cicero (41.26 KB, application/x-zip-compressed)
2014-05-01 02:32 UTC, norbert.janssen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Henry Stiles 2014-04-30 06:20:06 UTC
New bug created from 695190 Comment 11 norbert.janssen@oce.com

I added 3 testfiles (05_PCL, 13_PCL, cicero) and a profiledata.zip to peeves.
These files still are a bit slower on 914 with respect to 906 (profiling shows that this is in the rastergrafix area). I don't know if this can be improved much).
Comment 1 Henry Stiles 2014-04-30 17:04:29 UTC
I can't reproduce significant timing differences in any of these files on linux.  If you want us to look further, please report timings and profiles for each release.  I think you only sent profiles for the 914.
Comment 2 norbert.janssen 2014-05-01 01:34:17 UTC
Created attachment 10865 [details]
906 profiledata
Comment 3 norbert.janssen 2014-05-01 02:20:29 UTC
Created attachment 10866 [details]
golden_master profiledata

uploaded the 906 and git-trunk (1 may 2014) profiling data for
05_PCL5.pcl
13_PCL6.xl
cicero.xl
testfiles.

There is indeed not much difference between 906 and git-trunk.
Comment 4 norbert.janssen 2014-05-01 02:28:01 UTC
Created attachment 10867 [details]
LeadingEdge=3 profile for cicero
Comment 5 norbert.janssen 2014-05-01 02:30:25 UTC
Created attachment 10868 [details]
LeadingEdge=3 profile for cicero

I noticed however that there was a difference in time (28.9s for 906, 36.3s for git-trunk) when -dLeadingEdge=3 is used.
Comment 6 norbert.janssen 2014-05-01 02:32:16 UTC
Created attachment 10869 [details]
git LeadingEdge=3 profile for cicero
Comment 7 norbert.janssen 2014-05-01 02:32:59 UTC
Created attachment 10870 [details]
906 LeadingEdge=3 profile for cicero
Comment 8 Robin Watts 2014-05-02 10:54:29 UTC
Henry has bisected the problem and has discovered that it was introduced by a pair of commits:

http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=7b3a65aab20feac334cac8e5935ba5cbe310ac69

http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=b81962611a292e1b2c5306f3d5cdfea832715169

These were put in by me, and are required for correct operation of gs.

The device functions within ghostscript make an assumption that all bitmaps/pixmaps they are passed have a 'raster' of a given multiple. This exact raster multiple depends on the host architecture, and is set to the smallest value at which the processor can access memory at its highest speed.

(So on a 32bit processor, it's generally set to 4 bytes so that we can do 'int' based copying operations for speed).

The SSE thresholding code was breaking this rule and was sending data with a raster of 2 bytes. For many platforms this did not matter, but it is not safe in general. (It is even possible that there may be some devices where it matters on more platforms, though this is unlikely).

For safety, therefore, we have updating the LAND_BITS to always be at least as large as the align_bitmap_mod (in bits).

This means that for landscape thresholding we will be working in larger strips, and this may have an adverse effect on caching.

Norbert, you can probably override this code by setting LAND_BITS back to 16 for your target builds, but in general, we have to err on the side of caution.