Community
cancel
Showing results for
Did you mean:
Beginner
1,063 Views

## How to calculate L1 and L2 cache miss rate?

Hello, everyone:
I am a new user of Intel Vtune. I want to measure the L1 and L2 cache miss rate on intel Quad 4 Q6600 processor. The following formula is computing the L1 and L2 miss rate, am I right?

L1: L1D_CACHE_LD.I_STATE / L1D_CACHE_LD. MESI
L2: L2D_CACHE_LD.I_STATE / L2D_CACHE_LD. MESI

btw, I have another question about measuring the multithread application. I run two threads on the core0 and core1 of Q6600 which shared L2 cache. One thread is main thread, another is prefetch thread, How can I measue the impact of prefetch thread on the main thread? I mean how to evaluate the benefit ofprefetch thread?
1 Solution
Employee
1,063 Views
Quoting - explore_zjx

Hi, Peter
The following definition which I cited from a text or an lecture from people.cs.vt.edu/~cameron/cs5504/lecture8.pdf
Definitions:

- Local miss rate- misses in this cache divided by the total number of memory accesses to this cache (Miss rateL2)
- Global miss rate-misses in this cache divided by the total number of memory accesses generated by the CPU
(Miss RateL1 x Miss RateL2)

For a particular application on 2-level cache hierarchy:
- 1000 memory references
- 40 misses in L1
- 20 misses in L2

Calculate local and global miss rates

- Miss rateL1 = 40/1000 = 4% (global and local)
- Global miss rateL2 = 20/1000 = 2%
- Local Miss rateL2 = 20/40 = 50%

as for a 32 KByte 1st level cache; increasing 2nd level cache

L2 smaller than L1 is impractical

Global miss rate similar to single level cache rate provided L2 >> L1

Local miss rate not a good measure for secondary cache.

cited from:people.cs.vt.edu/~cameron/cs5504/lecture8.pdf
So I want to instrument the global and local L2 miss rate.

Hi,

Finally I understand what you meant:-) Actually Local miss rate and Global miss rate are NOT in VTune Analyzer's terminologies

Note that "\$ Miss rate" also can be defined by the user, you used (divided by) "memory references", but VTune Analyzer used "instructions retired".

According to your reqirements, I suggest to defineG-miss rate, L-miss rate as:

Again, that is your decision to define Event Ratio - VTune Analyzer provides typical event ratios, but the user can re-define what they like.

Regards, Peter
16 Replies
Employee
1,063 Views
Hi,

Q6600 is Intel Core 2 processor.Yourmain thread and prefetch thread canaccess data in shared L2\$. How to evaluate the benefit of prefetch thread? You can use VTune Analyzer to measure L2\$ misses in main threadto compare two situations: 1) use prefetch thread; 2) don'tuse prefetch thread.

Measuring L2\$ misses is tomodifysampling activity, "Configure Sampling"->Ratios->add "L2 Cache Miss Rate" to the list of "Selected Ratios:"
You can verify in "Selected events:" list, event L2_LINE_IN.SELF.ANY was added. Sampling result willdisplay "L2\$ Miss Rate" data to you.
L2 Cache Miss Rate = L2_LINES_IN.SELF.ANY / INST_RETIRED.ANY, you can find info inhelp file.

Hope it helps.

Regards, Peter
Beginner
1,063 Views
Hi,

Q6600 is Intel Core 2 processor.Yourmain thread and prefetch thread canaccess data in shared L2\$. How to evaluate the benefit of prefetch thread? You can use VTune Analyzer to measure L2\$ misses in main threadto compare two situations: 1) use prefetch thread; 2) don'tuse prefetch thread.

Measuring L2\$ misses is tomodifysampling activity, "Configure Sampling"->Ratios->add "L2 Cache Miss Rate" to the list of "Selected Ratios:"
You can verify in "Selected events:" list, event L2_LINE_IN.SELF.ANY was added. Sampling result willdisplay "L2\$ Miss Rate" data to you.
L2 Cache Miss Rate = L2_LINES_IN.SELF.ANY / INST_RETIRED.ANY, you can find info inhelp file.

Hope it helps.

Regards, Peter

hi, Peter
Thanks for your response. I don't know why the L2 cache miss rate in the vtune mannual is different from the definition in the text. The global L2 miss rate is L2 miss number/Memory reference, the local L2 miss rate is L2 miss number /L2 reference. What do you think about the above definition? How to calculate the L2 cache miss rate according to the above formulor. I mean which event I should select to measure.
Employee
1,063 Views

I'm not sure if I understand your words correctly - there is no concept for "global" and "local" L2 miss.

L2_LINES_INindicates all L2 misses, includinginstructions prefectching misses
MEM_LOAD_RETIRED.L2_LINE_MISS indicates all L2 misses, excludinginstructions prefetching misses.

Both above event miss rateswill be calculated by VTune Analyzerautomatically.

Regards, Peter
Beginner
1,063 Views

I'm not sure if I understand your words correctly - there is no concept for "global" and "local" L2 miss.

L2_LINES_INindicates all L2 misses, includinginstructions prefectching misses
MEM_LOAD_RETIRED.L2_LINE_MISS indicates all L2 misses, excludinginstructions prefetching misses.

Both above event miss rateswill be calculated by VTune Analyzerautomatically.

Regards, Peter

Hi, Peter
The following definition which I cited from a text or an lecture from people.cs.vt.edu/~cameron/cs5504/lecture8.pdf
Definitions:

- Local miss rate- misses in this cache divided by the total number of memory accesses to this cache (Miss rateL2)
- Global miss rate-misses in this cache divided by the total number of memory accesses generated by the CPU
(Miss RateL1 x Miss RateL2)

For a particular application on 2-level cache hierarchy:
- 1000 memory references
- 40 misses in L1
- 20 misses in L2

Calculate local and global miss rates

- Miss rateL1 = 40/1000 = 4% (global and local)
- Global miss rateL2 = 20/1000 = 2%
- Local Miss rateL2 = 20/40 = 50%

as for a 32 KByte 1st level cache; increasing 2nd level cache

L2 smaller than L1 is impractical

Global miss rate similar to single level cache rate provided L2 >> L1

Local miss rate not a good measure for secondary cache.

cited from:people.cs.vt.edu/~cameron/cs5504/lecture8.pdf
So I want to instrument the global and local L2 miss rate.

Employee
1,064 Views
Quoting - explore_zjx

Hi, Peter
The following definition which I cited from a text or an lecture from people.cs.vt.edu/~cameron/cs5504/lecture8.pdf
Definitions:

- Local miss rate- misses in this cache divided by the total number of memory accesses to this cache (Miss rateL2)
- Global miss rate-misses in this cache divided by the total number of memory accesses generated by the CPU
(Miss RateL1 x Miss RateL2)

For a particular application on 2-level cache hierarchy:
- 1000 memory references
- 40 misses in L1
- 20 misses in L2

Calculate local and global miss rates

- Miss rateL1 = 40/1000 = 4% (global and local)
- Global miss rateL2 = 20/1000 = 2%
- Local Miss rateL2 = 20/40 = 50%

as for a 32 KByte 1st level cache; increasing 2nd level cache

L2 smaller than L1 is impractical

Global miss rate similar to single level cache rate provided L2 >> L1

Local miss rate not a good measure for secondary cache.

cited from:people.cs.vt.edu/~cameron/cs5504/lecture8.pdf
So I want to instrument the global and local L2 miss rate.

Hi,

Finally I understand what you meant:-) Actually Local miss rate and Global miss rate are NOT in VTune Analyzer's terminologies

Note that "\$ Miss rate" also can be defined by the user, you used (divided by) "memory references", but VTune Analyzer used "instructions retired".

According to your reqirements, I suggest to defineG-miss rate, L-miss rate as:

Again, that is your decision to define Event Ratio - VTune Analyzer provides typical event ratios, but the user can re-define what they like.

Regards, Peter
Beginner
1,063 Views

Hi,

Finally I understand what you meant:-) Actually Local miss rate and Global miss rate are NOT in VTune Analyzer's terminologies

Note that "\$ Miss rate" also can be defined by the user, you used (divided by) "memory references", but VTune Analyzer used "instructions retired".

According to your reqirements, I suggest to defineG-miss rate, L-miss rate as:

Again, that is your decision to define Event Ratio - VTune Analyzer provides typical event ratios, but the user can re-define what they like.

Regards, Peter

Thanks very much.
Beginner
1,063 Views

I'm not sure if I understand your words correctly - there is no concept for "global" and "local" L2 miss.

L2_LINES_INindicates all L2 misses, includinginstructions prefectching misses
MEM_LOAD_RETIRED.L2_LINE_MISS indicates all L2 misses, excludinginstructions prefetching misses.

Both above event miss rateswill be calculated by VTune Analyzerautomatically.

Regards, Peter

and I need to calculate the ratio myself, or I can input the fomula in the vtune?
Employee
1,063 Views
Quoting - softarts

and I need to calculate the ratio myself, or I can input the fomula in the vtune?

MEM_LOAD_RETIRED.L2_LINE_MISS is to measure L2 data cache misses.

L2_LINE_IN are both is for L2 data cache missesand L2 instruction cache misses. If you have many "branch" code, L2_LINE_IN is helpful!

You can use VTune Analyzer'sdefinition (default):
L2 Cache Miss Rate = L2_LINE_IN.SELF.ANY/ INST_RETIRED.ANY

This result will be displayed in VTune Analyzer's report! No action is required from user!

Or you can use yourself definition: for example, if you don't care of L2 missby Instruction prefetching.
L2 Cache Miss Rate = MEM_LOAD_RETIRED.L2_LINE_MISS / INST_RETIRED.ANY

This result will NOT be displayed in VTune Analyzer's report! And the user can't input this formula in report!

Regards, Peter
Beginner
1,063 Views

MEM_LOAD_RETIRED.L2_LINE_MISS is to measure L2 data cache misses.

L2_LINE_IN are both is for L2 data cache missesand L2 instruction cache misses. If you have many "branch" code, L2_LINE_IN is helpful!

You can use VTune Analyzer'sdefinition (default):
L2 Cache Miss Rate = L2_LINE_IN.SELF.ANY/ INST_RETIRED.ANY

This result will be displayed in VTune Analyzer's report! No action is required from user!

Or you can use yourself definition: for example, if you don't care of L2 missby Instruction prefetching.
L2 Cache Miss Rate = MEM_LOAD_RETIRED.L2_LINE_MISS / INST_RETIRED.ANY

This result will NOT be display in VTune Analyzer's report! And the user can't input this formula in report!

Regards, Peter

my application measurement result is:

L1 cache miss: MEM_LOAD_RETIRED.L1 is 700, INST_RETIRED.ANY=0,that means L1 cache miss rate is infinite?

there is ideal result for cache miss rate?
Employee
1,063 Views
Quoting - softarts

my application measurement result is:

L1 cache miss: MEM_LOAD_RETIRED.L1 is 700, INST_RETIRED.ANY=0,that means L1 cache miss rate is infinite?

there is ideal result for cache miss rate?

I don't know why you have "INST_RETIRED.ANY=0", I guess that data is sample count, not event count.

VTune Performance Analyzer has default SAV (Sample After Value) setting for selected event, "sample is zero" means - your app ran shortly (event count < SAV value). You can increase workload or change default SAV value (by modifying your vtune activity).

By the way, the penalty of L1 cache miss is low. Usually you can ignore this.

Regards, Peter

Black Belt
1,063 Views

By the way, the penalty of L1 cache miss is low. Usually you can ignore this.

Probably so, in a case where L2 miss rate is high. However, if you suspect significant L1 miss rate, you should consider L1 TLB miss rate.
Beginner
1,063 Views
Can you elaborate how will i use CPU cache in my program? On OS level I know that cache is maintain automatically, On the bases of which memory address is frequently access. but if we forcefully apply specific part of my program on CPU cache then it helpful to optimize my code. Please give me proper solution for using cache in my program. Please Please!!
Employee
1,063 Views
Sigehere S. wrote:

Can you elaborate how will i use CPU cache in my program?
On OS level I know that cache is maintain automatically, On the bases of which memory address is frequently access.
but if we forcefully apply specific part of my program on CPU cache then it helpful to optimize my code.
Please give me proper solution for using cache in my program.