Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4992 Discussions

measuring L2 cache stalls (in CPU cycles or msec)

amitm02
Beginner
358 Views
Hi,

I know how to measure the L2 cache miss of my functions. Is there a way to measure CPU stalls those misses cause (if any) in msec or cpu cycles?
If it can not be measured directly, can i estimate it by other events/ratios such as CPI etc'.

Thanks
Amit
0 Kudos
3 Replies
Vladimir_T_Intel
Moderator
358 Views
Quoting - amitm02
Hi,

I know how to measure the L2 cache miss of my functions. Is there a way to measure CPU stalls those misses cause (if any) in msec or cpu cycles?
If it can not be measured directly, can i estimate it by other events/ratios such as CPI etc'.

Thanks
Amit

Hi Amit,

L2 cache misses introduce stalls in the CPU pipeline. The number of stalls is proportional to the number of L2 cache miss events with a factor of penalty. On the Core2 system the penalty is about 130 CPU clockticks (worst case) - check with the optimization manual for a particular microarchitecture. So the rough estimation in (CPU cycles) can be made by multiplying L2 cache miss events by penalty. Please, make sure you are counting events, not samples. You can see the number of events directly in VTune Hotspot results or have number of samples multiplied by SAV (sampling after value) for the events.

0 Kudos
Thomas_W_Intel
Employee
358 Views
Quoting - amitm02
Hi,

I know how to measure the L2 cache miss of my functions. Is there a way to measure CPU stalls those misses cause (if any) in msec or cpu cycles?
If it can not be measured directly, can i estimate it by other events/ratios such as CPI etc'.

Thanks
Amit

Amit,

Vladimir has pointed out how to estimate the worst case impact.The out-of-order engine might hide some of the latency. An eventthat can give you more insightsis RS_UOPS_DISPATCHED.CYCLES_NONE. It measures the cycles in which no micro-op is dispatched for execution, i.e. the execution units are waiting for work. Obviously, there might be different reasons for this than cache misses, but this event can show you, if you have an issue.

Kind regards
Thomas
0 Kudos
Dny
Beginner
358 Views

Amit,

Vladimir has pointed out how to estimate the worst case impact.The out-of-order engine might hide some of the latency. An eventthat can give you more insightsis RS_UOPS_DISPATCHED.CYCLES_NONE. It measures the cycles in which no micro-op is dispatched for execution, i.e. the execution units are waiting for work. Obviously, there might be different reasons for this than cache misses, but this event can show you, if you have an issue.

Kind regards
Thomas

Hello Amit,

You can also look at the cycle accounting from David Levinthal.

It will give you better idea of where your CPU cycles are utilized
assets.devx.com/goparallel/17775.pdf


Thanks,

Regards,
Dny
0 Kudos
Reply