We've got a bunch of large (1TB+) NUMA machines at work and have been getting a lot of noise in benchmarking. We've noticed that run times for our application are very consistent from one run to the next when there is nothing else running on the machine but when there are other jobs, even minor seemingly insignificant jobs, entering the run state it can significantly impact the performance of our application. We suspect it may be due to on chip memory bus contention and we'd like to test this hypothesis. Our goal is not to maximize the performance of our application here but just to eliminate variability in it so that we can accurately measure the impact of performance optimizations.
Is there a way to specify to the OS that a job should run exclusively on a single processor chip? In our configuration, we have 8 processor chips each with 10 processor cores (the OS sees 80 processors) so we'd like for only one processor core to be used on each processor chip.
We're running SLES11u3 I think on this machine but also have access to RHEL5.8 (and SUSE10/RHEL4). We looked at taskset before but it only specifies affinity to a given processor without impacting the behavior of other jobs (and unfortunately, there are lots of little ones from the OS at a minimum).