- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I can understand in the first iteration, the reported memory bandwidth should be quite different due to page switching. The results of the other iterations should be the same with or without initialization. However, I got 6.x cycles per float element in the 2nd run (iteration 2-10, with initialization of A) and 3.x cycles per float element in the first run (Without initialization of A).

I don't really understand this. I am using a 2-proc Xeon 2.4GHZ system with 533MHZ front bus. Anyone can explain this? Thanks!

//mysecond.c

#include

double mysecond()

{

struct timeval tp;

struct timezone tzp;

int i;

i = gettimeofday(&tp,&tzp);

return ( (double) tp.tv_sec + (double) tp.tv_usec * 1.e-6 );

}

//copy.c#include

#define RATE 2.4e9

#define N 1024*1024*64

#define NTIMES 10

extern double mysecond();

float a

int main () {

int i,j,k

int kk;

double t_1;

/initialization loop

for (kk = 0; kk N; kk++) {

a[kk] = 1;

}

for (k=0; k < NTIMES; k++) {t_1 = mysecond();

for (kk = 0; kk N; kk++)

b[kk] = a[kk];

t_1 = mysecond() - t_1;

printf("cycles/element = %lf, bandwidth =%lf
",t_1*RATE/(N*1.0),

N*1.0*4*1.0/t_1);

Message Edited by bigbearking on 01-19-2006 12:10 PM

Link Copied

1 Reply

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Try -Qvec-report3 to see if the init-loop makes any difference on the vectorization of loop2. It shouldn't.

Jennifer

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page