measuring L2 cache stalls (in CPU cycles or msec)

amitm02 · ‎06-14-2009

Hi,

I know how to measure the L2 cache miss of my functions. Is there a way to measure CPU stalls those misses cause (if any) in msec or cpu cycles?
If it can not be measured directly, can i estimate it by other events/ratios such as CPI etc'.

Thanks
Amit

Vladimir_T_Intel · ‎06-14-2009

Quoting - amitm02

Hi,

I know how to measure the L2 cache miss of my functions. Is there a way to measure CPU stalls those misses cause (if any) in msec or cpu cycles?
If it can not be measured directly, can i estimate it by other events/ratios such as CPI etc'.

Thanks
Amit

Hi Amit,

L2 cache misses introduce stalls in the CPU pipeline. The number of stalls is proportional to the number of L2 cache miss events with a factor of penalty. On the Core2 system the penalty is about 130 CPU clockticks (worst case) - check with the optimization manual for a particular microarchitecture. So the rough estimation in (CPU cycles) can be made by multiplying L2 cache miss events by penalty. Please, make sure you are counting events, not samples. You can see the number of events directly in VTune Hotspot results or have number of samples multiplied by SAV (sampling after value) for the events.

Thomas_W_Intel · ‎06-15-2009

Quoting - amitm02

Hi,

I know how to measure the L2 cache miss of my functions. Is there a way to measure CPU stalls those misses cause (if any) in msec or cpu cycles?
If it can not be measured directly, can i estimate it by other events/ratios such as CPI etc'.

Thanks
Amit

Amit,

Vladimir has pointed out how to estimate the worst case impact.The out-of-order engine might hide some of the latency. An eventthat can give you more insightsis RS_UOPS_DISPATCHED.CYCLES_NONE. It measures the cycles in which no micro-op is dispatched for execution, i.e. the execution units are waiting for work. Obviously, there might be different reasons for this than cache misses, but this event can show you, if you have an issue.

Kind regards
Thomas

Dny · ‎11-13-2009

Quoting - Thomas Willhalm (Intel)

Amit,

Vladimir has pointed out how to estimate the worst case impact.The out-of-order engine might hide some of the latency. An eventthat can give you more insightsis RS_UOPS_DISPATCHED.CYCLES_NONE. It measures the cycles in which no micro-op is dispatched for execution, i.e. the execution units are waiting for work. Obviously, there might be different reasons for this than cache misses, but this event can show you, if you have an issue.

Kind regards
Thomas

Hello Amit,

You can also look at the cycle accounting from David Levinthal.

It will give you better idea of where your CPU cycles are utilized
assets.devx.com/goparallel/17775.pdf

Thanks,

Regards,
Dny