Intel® C++ Compiler
Support and discussions for creating C++ code that runs on platforms based on Intel® processors.
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.

Question regarding mem copy

I wrote a simple mem copy program as below and compiled it with intel C compiler using -axN option to get vectorized code (With non-temporal writes). In one run, I removed the initialization loop on A and in another run I had this loop. The memory copy between two arrays A and B are repeated 10 times.

I can understand in the first iteration, the reported memory bandwidth should be quite different due to page switching. The results of the other iterations should be the same with or without initialization. However, I got 6.x cycles per float element in the 2nd run (iteration 2-10, with initialization of A) and 3.x cycles per float element in the first run (Without initialization of A).

I don't really understand this. I am using a 2-proc Xeon 2.4GHZ system with 533MHZ front bus. Anyone can explain this? Thanks!



double mysecond()
struct timeval tp;
struct timezone tzp;
int i;
i = gettimeofday(&tp,&tzp);
return ( (double) tp.tv_sec + (double) tp.tv_usec * 1.e-6 );



#define RATE 2.4e9
#define N 1024*1024*64
#define NTIMES 10

extern double mysecond();

float a, b;

int main () {
int i,j,k
int kk;

double t_1;

/initialization loop
for (kk = 0; kk N; kk++) {
a[kk] = 1;


for (k=0; k < NTIMES; k++) {
t_1 = mysecond();

for (kk = 0; kk N; kk++)
b[kk] = a[kk];

t_1 = mysecond() - t_1;

printf("cycles/element = %lf, bandwidth =%lf ",t_1*RATE/(N*1.0),

Message Edited by bigbearking on 01-19-2006 12:10 PM

0 Kudos
1 Reply
Try -Qvec-report3 to see if the init-loop makes any difference on the vectorization of loop2. It shouldn't.