- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I wrote a simple mem copy program as below and compiled it with intel C compiler using -axN option to get vectorized code (With non-temporal writes). In one run, I removed the initialization loop on A and in another run I had this loop. The memory copy between two arrays A and B are repeated 10 times.
I can understand in the first iteration, the reported memory bandwidth should be quite different due to page switching. The results of the other iterations should be the same with or without initialization. However, I got 6.x cycles per float element in the 2nd run (iteration 2-10, with initialization of A) and 3.x cycles per float element in the first run (Without initialization of A).
I don't really understand this. I am using a 2-proc Xeon 2.4GHZ system with 533MHZ front bus. Anyone can explain this? Thanks!
t_1 = mysecond();
I can understand in the first iteration, the reported memory bandwidth should be quite different due to page switching. The results of the other iterations should be the same with or without initialization. However, I got 6.x cycles per float element in the 2nd run (iteration 2-10, with initialization of A) and 3.x cycles per float element in the first run (Without initialization of A).
I don't really understand this. I am using a 2-proc Xeon 2.4GHZ system with 533MHZ front bus. Anyone can explain this? Thanks!
//mysecond.c
#include
double mysecond()
{
struct timeval tp;
struct timezone tzp;
int i;
i = gettimeofday(&tp,&tzp);
return ( (double) tp.tv_sec + (double) tp.tv_usec * 1.e-6 );
}
//copy.c#include
#define RATE 2.4e9
#define N 1024*1024*64
#define NTIMES 10
extern double mysecond();
float a
int main () {
int i,j,k
int kk;
double t_1;
/initialization loop
for (kk = 0; kk N; kk++) {
a[kk] = 1;
}
for (k=0; k < NTIMES; k++) {t_1 = mysecond();
for (kk = 0; kk N; kk++)
b[kk] = a[kk];
t_1 = mysecond() - t_1;
printf("cycles/element = %lf, bandwidth =%lf
",t_1*RATE/(N*1.0),
N*1.0*4*1.0/t_1);
Message Edited by bigbearking on 01-19-2006 12:10 PM
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try -Qvec-report3 to see if the init-loop makes any difference on the vectorization of loop2. It shouldn't.
Jennifer

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page