Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Intel Community
- Software Development Tools (Compilers, Debuggers, Profilers & Analyzers)
- Analyzers
- I'm still confused by how to calculate the L1, L2 cache miss ratio after reading many related posts.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

sun_s_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-26-2013
06:39 AM

46 Views

I'm still confused by how to calculate the L1, L2 cache miss ratio after reading many related posts.

I'm trying to use Vtune to get the L1I, L1D, and L2 cache miss ratio on the platform of Intel Xeon core microarchitecture.

First of all, the miss ratio I'm trying to get is the one under traditional definition like L2 misses number/whole L2 requests,not the one that defined in the Intel manual to calculate the ratio of L2 misses number of whole instruction retired like L2_LINE_MISS.SELF.ANY/INST_RETIRED.ANY.

Therefore, my question is:

1).When it comes to L1 cache miss ratio, I'm using the following formula by the meaning of the hardware events literally:

L1I=L1I_MISSES/ L1I_READ

L1D=MEM_LOAD_RETIRIED.L1D_LINE_MISS/L1D_ALL_REF .

I'm useing this formula but I'm not sure whether it is correct or am I missing some other hardware events to be put into this formula.

2).As to the L2 miss ratio, I know that the difference between MEM_LOAD_RETIRED.L2_LINE_MISS and L2_LINE_MISS.SELF.ANY is that the latter includes the instruciton fetch misses. I want to get the whole L2 miss ratio including instruction prefetching. So I would like to use L2_LINE_MISS.SELF.ANY as the numerator and the sum of L1D misses and the L1I misses as the denominator.

So the formula should be like this:

L2 cache miss ratio= L2 misses number / whole L2 requests=L2_LINE_MISS.SELF.ANY/(MEM_LOAD_RETIRED.L1D_LINE_MISS+L1I_MISSES)

But here comes the question that when I use this formula to calculate the L2 miss ratio of a program in the Graphlab, the numerator is bigger than the denominator which means the miss ratio is bigger than 1 . Obviously it is incorrect.

So I realize that there are something wrong with the hardware events that I used in the formula and I suppose it would be the denominator.

I'm looking for the hardware events that could stand for the whole L2 requests but I got some events like L2_RQST.SELF.ANY**.S_STATE**, L2_RQST.SELF.ANY.**M_STATE**,L2_RQST.SELF.ANY**.I_STATE**, L2_RQST.SELF.ANY**.E_STATE, **L2_RQST.SELF.ANY.MESI, L2_RQST.SELF.**DEMAND.M_STAT**E, L2_RQST.SEL**F.DEMAND.S_STATE**, L2_RQST.SELF**.DEMAND.E_STATE,** L2_RQST.SELF**.DEMAND.I_STATE. **

Those are events telling the L2 requests from different units or the accessed times of the cache lines under different states.

I've no idea should I use any of them in this formula and how ?

Any help would be appreciated.

Sun.

Link Copied

2 Replies

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-26-2013
08:33 AM

46 Views

I suppose it's not coincidental that VTune General analysis quotes L1 hit rates but gives only raw numbers in the various categories of L2 miss. Even those L1 hit rates don't always make sense when I fliter down to a single thread, while cache hit rates on idle threads don't have any meaning for me.

I suppose you're ending up counting repeated misses more heavily than repeated access requests.

sun_s_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-29-2013
04:40 AM

46 Views

TimP (Intel) wrote:

I suppose it's not coincidental that VTune General analysis quotes L1 hit rates but gives only raw numbers in the various categories of L2 miss. Even those L1 hit rates don't always make sense when I fliter down to a single thread, while cache hit rates on idle threads don't have any meaning for me.

I suppose you're ending up counting repeated misses more heavily than repeated access requests.

Right now, I'm thinking that the denominator is far less than the actual number maybe since that the L1D misses just represent the misses from one core or several cores instead of all the cores.

And I see that there are 2 hardware events named L2_IFETCH.BOTH_CORES and L2_IFETCH.SELF which descriptions are "counts events initiated by either core" and "counts events initiated by this core only" individually. I'm quiet confused by this 2 descriptions. What does that mean by "either core" and "this core only" since my processor have 8 cores.

I'm struggled in using Vtune to get the cache miss ratio for a month and still don't know the exact methods to get the correct answer.

So I'll be really appreciate that if you could give me some hint about this.

Best regards,Sun.

For more complete information about compiler optimizations, see our Optimization Notice.