Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Find time spent in cache for Sandy Bridge


I was trying to calculate the time my application spends in the cache by using the published specs of 4 cycles in L1, 12 in L2, etc., along with the number of L1, L2 loads that my application makes. I realized that since sandy bridge can issue 2 loads in one cycle, my results were wonky.
My question is: how do I go about calculating the time spent in cache alone, given that there a parallel loads possible to L1?

0 Kudos
1 Reply
New Contributor II
Hi Chinnappa,

Intel processors use what is known as Out of Order architecture. This means that there are many instructions that are active at any given time. When an instruction misses in a cache level - or is delayed for any other reason - and has to wait, another instruction maybe able to progress, and so it is not possible to assume that while an instruction is waiting for cache response nothing else is happening.

I suggest you read up on OOO ( and follow the links to additional details.

Once you have done that, you can browse through the articles on the Platform Performance Monitoring portal: which will have articles that explain how to understand the specific latencies and bottlenecks that affect your programs.

I hope this helps,
0 Kudos