Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Random and consecutively access and L2/L3 cache hit ratio

Marcin_W_1
Beginner
473 Views

Dear experts,
I am trying to understand the following phenomena. I have plain array of integers (500000 x 4B) which I access randomly (index generated by system rand call) (case 1) and consecutively (case 2). I wanted to see the L2/L3 cache access counters in this two cases. My naive expectations were that the array of that size (2MB) would fit in the L3 cache (3MB) and both cases would give comparable cache-misses counts. In fact I have expected that the counter getL3CacheHitRatio would be around 1.
Surprisingly it is close to 0.8 for the random access and close to 0.3 for iterative access.
I want to repeat this for bigger (200MB) array but before that step I need to know how to interpret this counters.

My hardware is:
Output of "uname -a":
Linux wm 3.5.7-gentoo #5 SMP PREEMPT Thu Dec 13 18:01:42 CET 2012 x86_64 Intel(R) Core(TM) i3-2370M CPU @ 2.40GHz GenuineIntel GNU/Linux

Results are following:

histogram.png

The results are from 100 execution of the code attached below.


And the code:

#include <iostream>
#include <vector>
#include <cpucounters.h>
#include <ctime>
#include <fstream>
#define SIZE 500000

int main()
{
std::ofstream out("data.txt",std::ofstream::app);

int *coll = new int[SIZE];
for (int i=0; i< SIZE; ++i)
{
coll = i;
}
std::cout << "Size of Collection: " << SIZE*sizeof(int)/1000000 << "MB" << std::endl;

int tmp;
PCM * m = PCM::getInstance();
if (m->good()) m->program();
else return -1;
SystemCounterState before_sstate = getSystemCounterState();
tmp = 0;

//Consecutively reading
std::cout << std::endl << "Consecutively access to collection" << std::endl;
for (int i=0; i<SIZE; ++i)
{
tmp += coll;
}
std::cout << "Sum:" << tmp << std::endl;
SystemCounterState after_sstate = getSystemCounterState();
std::cout << "Instructions per clock: " << getIPC(before_sstate,after_sstate)
<< "\nNumber of L2 cache hits: " << getL2CacheHits(before_sstate,after_sstate)
<< "\nNumber of L3 cache hits: " << getL3CacheHits(before_sstate,after_sstate)
<< "\nL2 cache hit ratio: " << getL2CacheHitRatio(before_sstate,after_sstate)
<< "\nL3 cache hit ratio: " << getL3CacheHitRatio(before_sstate,after_sstate)
<< "\nBytes read: " << getBytesReadFromMC(before_sstate,after_sstate) << std::endl;
out << getL2CacheHitRatio(before_sstate,after_sstate) << "\t" << getL3CacheHitRatio(before_sstate,after_sstate) << "\t";

SystemCounterState before_sstate2 = getSystemCounterState();
tmp = 0;

//Random reading
std::cout << std::endl << "Random acces to collection" << std::endl;
srand(time(0));
for (int i=0; i<SIZE; ++i)
{
tmp += coll[rand()%SIZE];
}
std::cout << "Sum:" << tmp << std::endl;
SystemCounterState after_sstate2 = getSystemCounterState();
std::cout << "Instructions per clock: " << getIPC(before_sstate2,after_sstate2)
<< "\nNumber of L2 cache hits: " << getL2CacheHits(before_sstate2,after_sstate2)
<< "\nNumber of L3 cahe hits: " << getL3CacheHits(before_sstate2,after_sstate2)
<< "\nL2 cache hit ratio: " << getL2CacheHitRatio(before_sstate2,after_sstate2)
<< "\nL3 cache hit ratio: " << getL3CacheHitRatio(before_sstate2,after_sstate2)
<< "\nBytes read: " << getBytesReadFromMC(before_sstate2,after_sstate2) << std::endl;
out << getL2CacheHitRatio(before_sstate2,after_sstate2) << "\t" << getL3CacheHitRatio(before_sstate2,after_sstate2) << std::endl;

}

0 Kudos
1 Reply
Bernard
Valued Contributor I
473 Views

Try to use VTune for more comprehensive analysis.

0 Kudos
Reply