- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello. I am working on analyzing execution time.
I have two similar codes below.
1st code:
int size = 256*1024*1024;
int stride = 256;
void *array = malloc(size);
for (unsigned long off = 0; off < size; off += stride) {
*(unsigned int *)(array+off) = off+stride;
}
*(unsigned int*)(array+off) = 0;
int i=10000000;
struct timeval start, end;
gettimeofday(&start, NULL);
while (i>=1) {
offset = *(unsigned int*)(array+off);
i--;
}
gettimeofday(&end, NULL);
*(volatile unsigned int*)(array+offset);
printf("%.2f\n", (end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec));
2nd code:
int size = 256*1024*1024;
int stride = 256;
void *array = malloc(size);
for (unsigned long off = 0; off < size; off += stride) {
*(unsigned int *)(array+off) = off+stride;
}
*(unsigned int*)(array+off) = 0;
int i=10000000;
struct timeval start, end;
gettimeofday(&start, NULL);
#define ONE offset = *(unsigned int*)(array+off);
#define FIVE ONE ONE ONE ONE ONE
#define TEN FIVE FIVE
#define FIFTY TEN TEN TEN TEN TEN
#define HUNDRED FIFTY FIFTY
while (i>=1000) {
HUNDRED
HUNDRED
HUNDRED
HUNDRED
HUNDRED
HUNDRED
HUNDRED
HUNDRED
HUNDRED
HUNDRED
i-=1000;
}
gettimeofday(&end, NULL);
*(volatile unsigned int*)(array+offset);
printf("%.2f\n", (end.tv_sec-start.tv_sec)*1000000+(end.tv_usec-start.tv_usec));
Questions
1) The only difference between two codes is "while loop."
They both measure the elapsed time for while loop.
When I executed two codes with my computer (with disabled hardware prefetch), the first code makes a result of 779,851,000 ns and the second code makes a result of 1,624,344,000 ns (2.1 times larger)
I thought this difference comes from L1-i cache misses, so I measured L1-i cache misses with perf.
However, the L1-i cache miss of the first code is 34,541 and the L1-i cache miss of the second code is 43,078 (1.2 times larger),
This result cannot completely explain the difference in elapsed times for while loop.
What makes the big difference between elapsed times of two codes? Is there anything that I miss?
2) When I used Top-down analysis of general exploration of VTune with the first code, I got total elapsed time of 1.125 s and DRAM bound of 51.3 %. The measured elapsed time for while loop (=result of this program) was 926,295,000 ns.
I expected the DRAM stall time (1.125*51.3/100 = 0.577 s) would be equal or larger than the result of the program (0.926295 s) since all instructions in the while loop make LLC miss.
However, measured elapsed time is about 1.6 times larger than the DRAM stall time.
Why are the two values different?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The only think i recognised when i looked at tour code it is: seond one is faster
but i have no idea what to answer about your question

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page