- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I've learned the following formula to calculate the L1,L2,L3 Miss rate from another post which is given by @Kirill Rogozhin (Intel):

L3 cache miss

(180 * MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS) / CPU_CLK_UNHALTED.THREAD

L2 cache miss

((26 * MEM_LOAD_UPOS_RETIRED.LLC_HIT_PS) + (43 * MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT_PS) + (60 * MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM_PS)) / CPU_CLK_UNHALTED.THREAD

L1 cache miss

((12 * MEM_LOAD_UOPS_RETIRED.L2_HIT) + (26 * MEM_LOAD_RETIRED.LLC_HIT_PS) + (43 * MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT_PS) + (60 * MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM_PS) + (180 * MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS)) / CPU_CLK_UNHALTED.THREAD

However, there are 2 problems are encoutered during the process.

1.When I calculate the L3 miss rate, I get 90%. But my test application code is very simple. Therefore , the miss rate can't be that

big. And when I calculate the L2 miss rate, the result is bigger than 1 which is obvious not correct.

2.When I use hardware event :MEM_LOAD_RETIRED.LLC_HIT_PS , it shows that it's a invalid event. But on the platform of Sandybridge,

this event should be valid. So, I've no idea what's happening.

Any help would be appreciated.

Sun.

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

>> L1 cache miss

>> ((12 * MEM_LOAD_UOPS_RETIRED.L2_HIT) + (26 * MEM_LOAD_RETIRED.LLC_HIT_PS) + (43 * MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT_PS) + (60 * MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM_PS) + (180 * MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS)) / CPU_CLK_UNHALTED.THREAD

It might cause misunderstandings, and regard that L1 data cache miss is more expensive, traditionally L1 data miss means L2 hit. Above is penalty for all L1/L2/LLC miss.

Penalty is MEM_LOAD_UOPS_RETIRED.L2_HIT * 12 for L1 miss.

>>1.When I calculate the L3 miss rate, I get 90%. But my test application code is very simple. Therefore , the miss rate can't be that big. And when I calculate the L2 miss rate, the result is bigger than 1 which is obvious not correct.

Please provide test case, and what processor you work on. If the sample is confidential - please go Intel Premier to consult.

>> 2.When I use hardware event :MEM_LOAD_RETIRED.LLC_HIT_PS , it shows that it's a invalid event. But on the platform of Sandybridge

Event MEM_LOAD_UOPS_RETIRED.LLC_HIT_PS is supported in my SandyBridge processor. Note that different processor may have different event name for LLC HIT, use "amplxe-runss -event-list | grep LLC_HIT_PS" to check.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Thanks for your help.

Now I see that the constants in the formula means the cycles needed to service the event.

But I actually just want to get the miss ratio of each level cache.

Can I just simply use the following formulas given by @Shannon Cepeda (Intel) to get the miss ratio ?

*Demand Data* L2 Miss Rate =>

(sum of all types of L2 demand data misses) / (sum of L2 demanded data requests) =>

(MEM_LOAD_UOPS_RETIRED.LLC_HIT_PS + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT_PS + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM_PS + MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS) / (L2_RQSTS.ALL_DEMAND_DATA_RD)

*Demand Data* L3 Miss Rate =>

L3 demand data misses / (sum of all types of demand data L3 requests) =>

MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS / (MEM_LOAD_UOPS_RETIRED.LLC_HIT_PS + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HIT_PS + MEM_LOAD_UOPS_LLC_HIT_RETIRED.XSNP_HITM_PS + MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS)

*Demand Data* L1 Miss Rate => cannot calculate.

And I don't understant why can't we calculate the L1 data cache miss ratio?

Thanks.

Sun

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

My opinion is to use formulas from your primary post. But,

L1 cache miss (hit in L2) is not worth to be written in guideline doc, because its penalty is tiny.

If you want to evaluate L1 miss to impact on performance, use -

L1 cache miss ratio:

MEM_LOAD_UOPS_RETIRED.L2_HIT * 12 / CPU_CLK_UNHALTED.THREAD,

Investigate if ratio > 0.2, but you know it never reach this threshold. that is why we can ignore them.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page