The only think i recognised

Yukyoung_L_ · ‎10-23-2018

Hello. I am working on analyzing execution time.

I have two similar codes below.

1st code:

int size = 256*1024*1024;

int stride = 256;

void *array = malloc(size);

for (unsigned long off = 0; off < size; off += stride) {

*(unsigned int *)(array+off) = off+stride;

}

*(unsigned int*)(array+off) = 0;

int i=10000000;

struct timeval start, end;

gettimeofday(&start, NULL);

while (i>=1) {

offset = *(unsigned int*)(array+off);

i--;

}

gettimeofday(&end, NULL);

*(volatile unsigned int*)(array+offset);

printf("%.2f\n", (end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec));

2nd code:

int size = 256*1024*1024;

int stride = 256;

void *array = malloc(size);

for (unsigned long off = 0; off < size; off += stride) {

*(unsigned int *)(array+off) = off+stride;

}

*(unsigned int*)(array+off) = 0;

int i=10000000;

struct timeval start, end;

gettimeofday(&start, NULL);

#define ONE offset = *(unsigned int*)(array+off);

#define FIVE ONE ONE ONE ONE ONE

#define TEN FIVE FIVE

#define FIFTY TEN TEN TEN TEN TEN

#define HUNDRED FIFTY FIFTY

while (i>=1000) {

HUNDRED

i-=1000;

}

gettimeofday(&end, NULL);

*(volatile unsigned int*)(array+offset);

printf("%.2f\n", (end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec));

Questions

1) The only difference between two codes is "while loop."

They both measure the elapsed time for while loop.

When I executed two codes with my computer (with disabled hardware prefetch), the first code makes a result of 779,851,000 ns and the second code makes a result of 1,624,344,000 ns (2.1 times larger)

I thought this difference comes from L1-i cache misses, so I measured L1-i cache misses with perf.

However, the L1-i cache miss of the first code is 34,541 and the L1-i cache miss of the second code is 43,078 (1.2 times larger),

This result cannot completely explain the difference in elapsed times for while loop.

What makes the big difference between elapsed times of two codes? Is there anything that I miss?

2) When I used Top-down analysis of general exploration of VTune with the first code, I got total elapsed time of 1.125 s and DRAM bound of 51.3 %. The measured elapsed time for while loop (=result of this program) was 926,295,000 ns.

I expected the DRAM stall time (1.125*51.3/100 = 0.577 s) would be equal or larger than the result of the program (0.926295 s) since all instructions in the while loop make LLC miss.

However, measured elapsed time is about 1.6 times larger than the DRAM stall time.

Why are the two values different?

mayer__max · ‎11-07-2018

The only think i recognised when i looked at tour code it is: seond one is faster

but i have no idea what to answer about your question

Analyzing execution time