Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Measuring Memory Access Time

chetreb
Beginner
527 Views
I have an Intel Xeon server running 64 bit linux, on which I am trying to
measure the cache miss overhead in the L1D cache. The code to do
this is shown below.

The code uses a buffer[twice the L1D cache size] and bufindex[2 x associatity of L1D cache].

There is a function findsets, which, given an index
into buffer, will find all the indices in buffer which map into the same set.
These indices (call them m0, m1, m2, ....m15) are stored in bufindex.

There is also a function measureflush, which measures the
time to access elements from buffer.

I make 4 calls to measureflush as follows.
r = measureflush(0, 8); /* First Call measures the time to access m0 to m8*/
r ^= measureflush(0, 8); /* Second Call measures the time to access m0 to m8 */
r ^= measureflush(9, 16); /* Third Call measures the time to access m9 to m15*/
r ^= measureflush(0, 8); /* Fourth Call measures the time to access m0 to m8 */

I would expect that due the associativity, the second call to measureflush wouldtake lesser time than the the fourth. However this doesnot seem to be the case, as both calls take roughly the same time (which I thinkmeans that I am getting cache hits in both cases). A sample output is shown below.Where am I going wrong ? Is there something else I need to do in order to see the cache misses ? Any help in this regard will be really useful.

Thanks in advanceChetreb


[bash]

Sample Output
-- first call-- (clearly are compulsory misses)
0 270
1 4992
2 2396
3 2268
4 2282
5 2674
6 2368
7 2311
-- second call--
0 178
1 171
2 178
3 170
4 206
5 206
6 199
7 569
-- third call-- (compulsory misses again)
9 2979
10 2261
11 2382
12 2631
13 2268
14 2262
15 2290
-- fourth call--
0 184
1 213
2 477
3 178
4 178
5 178
6 185
7 221

[/bash][bash]/* * Cache Assumption: * L1 D : 32 KByte, 8 way set associative, 64 byte cache line * This means that there are 64 sets * * Tag size : 20 bits (a32 - a12) * Set size : 6 bits (a11 - a6) * Offset : 6 bits (a5 - a0) */ #include #include #include #define MAXSIZE (8192*2) unsigned int buffer[MAXSIZE]; unsigned int bufindex[16]; unsigned int r; volatile unsigned measureflush(int from, int to ) { unsigned int t2, t1; int i; for (i=from; i] from memory */ asm volatile (" xor %%eax, %%eax \n" " cpuid \n" " rdtsc \n" " mov %%eax,%%edi \n" " mov (%%esi),%%ebx \n" " xor %%eax,%%eax \n" " cpuid \n" " rdtsc \n" " sub %%edi,%%eax \n" " mov %%eax,(%%esi)\n" : "=a"(t2), "=D"(t1) : "S"(buffer + bufindex) : "ebx", "ecx", "edx", "cc"); /* t2 contains the memory access time */ buffer[bufindex] = t2; } for (i=from; i]); } printf("\n"); return 0; } #define GETSET(x) ((x >> 6) & 0x3f) #define GETOFFSET(x) (x & 0x3f) #define GETTAG(x) (x >> 12) void findsets(int index) { unsigned long bufoffset[16]; int i; for (i=0; i < 16; ++i){ bufoffset = (unsigned long) (buffer + index + (1024 * i)); bufindex = (1024 * i) + index; printf("%d\t", bufindex); printf("%x\t", bufoffset); printf("%x\t", GETSET(bufoffset)); printf("%x\t", GETOFFSET(bufoffset)); printf("%x\n", GETTAG(bufoffset)); } } int main() { int i; findsets(0); r = measureflush(0, 8); /* First Call */ r ^= measureflush(0, 8); /* Second Call */ r ^= measureflush(9, 16); /* Third Call */ r ^= measureflush(0, 8); /* Fourth Call */ return; }[/bash]
0 Kudos
1 Reply
Hussam_Mousa__Intel_
New Contributor II
527 Views
Hi,

Have you used any performance counters to profile the cache events?

You can use this tool: http://software.intel.com/en-us/articles/intel-performance-counter-monitor/
to measure your cache hits/misses at multiple levels.

This should be a useful first step in understanding your code's memory access latencies.

-Hussam
0 Kudos
Reply