I read below from "IA-32 Intel Architecture Optimization Chapter 7",
and try to measure performance. But I don't know how I show the performance.
When I turned on the Hyper-Threading, performance went down.
Is there any incorrect part?
I also wonder which factors affect performance.
Test Source is my code. Pleae Review the code.
OS : Redhat Linux 9
compiler : icc 8.0
When two threads are executing on two physical processors and sharing
data, reading from or writing to shared data usually involves several bus
transactions (including snooping, request for ownership changes, and
sometimes fetching data across the bus). A thread accessing a large
amount of shared memory is not likely to scale with processor clock
sharing of data between threads that execute on different physical processors
sharing a common bus.
variables if it is to be accessed repeatedly over an extended period. If
necessary, results from multiple threads can be combined later by
writing them back to a shared memory location. This approach can also
minimize time spent to synchronize access to shared data.
#define DPRINTF(arg) printf arg
#define NUM_PROC 4
#define MAXLEN 1024*1024
int full_cnt = 1;
int *t1, *t2, *t3;
long count = 0;
for (i=0; i
for (count=0; count
t1 = (int*)malloc(sizeof(int)*MAXLEN);
t2 = (int*)malloc(sizeof(int)*MAXLEN);
t3 = (int*)malloc(sizeof(int)*MAXLEN);
t2[count] = B[count];
t3[count] = t1[count] + t2[count];
for (count=0; count
printf("usage: false_none count ");
gettimeofday(&star t, NULL);
printf("%ld sec, %ld usec ", result.tv_sec, result.tv_usec);
for (i=255; i< 275; i++)
Are you running the same number of threads as before? They could be scheduled on the same physical processor (in a dual HT platform) which is what you are trying to avoid with your division of data. Even with four threads in a dual HT enabled system, you will have two threads assigned to the same processor with sharing or even splitting of cache and other processor resources.