About False Sharing

icicle · ‎03-26-2004

Anyone Know False Sharing?

Recently I try to know relation between Hyper-Threading and user optimization.
I tested below source on IBM XSeries 225 which has two Xeon 2.4 GHz processors.
I thought that avoiding cache false sharing lifted up performance.
When I padd some data structure, my assumption came true.
But when I turned on Hyper-Threading in BIOS, performance went down.
To improve performance using Hyper-Threading, what factor must I use or change?
Will I increase number of thread?

system spec: H/W : IBM Xseries 225
OS : Redhat Linux 9
compiler : icc 8.0
reference site for source :

http://www.intel.com/cd/ids/developer/asmo-na/eng/microprocessors/ia32/pentium4/hyperthreadi

ng/19980.htm

Source
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

#include
#include
#include

struct thread_param {
// 4*4 = 16 bytes
unsigned long thread_id;
unsigned long v;
unsigned long start;
unsigned long end;
#ifdef FALSE_SHARING_FIX
// expand to 128 bytes to avoid false sharing
// (4 long + 28 padding)*4 = 128 bytes
int padding[12];
#endif
};

// 1024*1024
#define MAXLEN 1024*1024
#define NUM_PROC 4

int array[MAXLEN];
int count=0;

// example of false sharing
void* thread_fn(void* arg) {
struct thread_param *p = (struct thread_param*)arg;
int i;

for (i=0; ifor (p->v = p->start; p->v < p->end; p->v++)
array[p->v] += 1;
}
}

int main(int argc, char *argv[]) {
pthread_t tid[NUM_PROC];
struct thread_param thread_struct[NUM_PROC];
int i, interval;
struct timeval start, end, result;

if (argc < 2) {
printf("usage: false_none count ");
return 0;
}

count = atoi(argv[1]);

printf("False sharing testing begin... ");
#ifdef FALSE_SHARING_FIX
printf("with FIX ");
#else
printf("without FIX ");
#endif
printf(" total execution time for ");

for (i=0; iarray = 1;

interval = MAXLEN/NUM_PROC;
for (i=0; i< NUM_PROC-1; i++) {
thread_struct.thread_id = i;
thread_struct.start = i * interval;
thread_struct.end = thread_struct.start + interval;
}

thread_struct[NUM_PROC - 1].thread_id = NUM_PROC;
thread_struct[NUM_PROC - 1].start = (NUM_PROC - 1) * interval;
thread_struct[NUM_PROC - 1].end = MAXLEN;

for (i=0; ipthread_create(&tid, NULL, thread_fn, &thread_struct);
}

gettimeofday(&start, NULL);

for (i=0; ipthread_join(tid, NULL);

gettimeofday(&end, NULL);

timersub(&end, &start, &result);
printf("%ld sec, %ld usec ", result.tv_sec, result.tv_usec);

return 0;
}

TimP · ‎03-26-2004

If your performance was reduced by turning on HT, running the same test with 2 threads, it doesn't look like a false sharing issue. To get an advantage from HT, you do usually need to increase the number of threads to match the number of logical processors. A significant reduction in performance is likely to be a scheduling problem. I don't know whether schedulers which work better with HT on dual CPU's are likely to come with distros incorporating 2.6 kernels.

TimP · ‎03-26-2004

Red Hat EL3 Update 2 is supposed to be the first stock linux distribution with improved dual processor HT scheduling.

ClayB · ‎04-06-2004

Persepone -

As Tim pointed out, if you kept the same two threads when running under HT, the OS may have scheduled both threads onto the same physical processor (the two logical HT processors). This would result in a performance drop comapred to the dual-processor test without HT. Have you tired to run this with four threads on a dual-processor, HT-enabled system?

-- clay