Community
cancel
Showing results for 
Search instead for 
Did you mean: 
handlight
Beginner
81 Views

How can I get the L2 cache miss rate on XEON E5540 with Linux kernel 2.6.35

I'm a new user of Intel VTune Amplifier XE 2011 and I'm not familiar with it.
These days I'm working on an application monitoring the numerous network dataflow. The CPU
is XEON E5540 2.53G with HT enabled. The application runs on one single core of the CPU and
the application's CPU usage is above 95% at all time. So, it may trigger a large amount of L2 cache
miss, right?
I set the property to "profile system" and chose to use L2_LINES_IN.SELF.ANY /
INST_RETIRED.ANY according to "Intel 64 and IA-32 Architectures Optimization Reference
Manual(page B-74)". But unfortunately, I couldn't find L2_LINES_IN.SELF.ANY in VTune for
Linux. So I used L2_LINES_IN.ANY/INST_RETIRED.ANY to get the L2 cache miss rate. But the
result showed that the application resulted in only a little L2 cache miss (about 3% by average), which
is far more lower than what I had expected(about 20% to 40%).
So my question is ,can I use L2_LINES_IN.ANY instead of L2_LINES_IN.SELF.ANY? What's the
difference between them? Or are there any other ways can be used to get L2 cache miss
rate? I didn't have RDC installed ,all of my work is under Linux.
Wish to get your help.
Thanks.
0 Kudos
6 Replies
Peter_W_Intel
Employee
81 Views

Application's CPU usage is above 95%, it didn't imply there a large amount of L2$ misses.
It seems that your application is single thread, works on 4 cores system. So if you use L2_LINES_IN.ANY, that is OK.
My opinion is to use ratio: L2_LINES_IN.ANY / CPU_CLK_UNHALTED.CORE * 100
If percentage is greater than 20%, consider to reduce missed by modify code.
Regards, Peter
handlight
Beginner
81 Views

Thank you for your quick reply^_^ .I will take a try with ths method.
I'm still confused with the difference betweenL2_LINES_IN.ANY andL2_LINES_IN.SELF.ANY, or under which setting can I find theL2_LINES_IN.SELF.ANY and use it?
And what is the meaning of "L2_LINES_IN.ANY / CPU_CLK_UNHALTED.CORE * 100"?
Thanks:)
Peter_W_Intel
Employee
81 Views

I think thatE5540 is Intel Core 2 processor, I found L2_LINES_IN in manual:
L2_LINES_IN.BOTH_CORES.ANY - collect all L2 misses from 4 cores
L2_LINES_IN.SELF.ANY - collect all L2 misses from this core only
Since you said that you ran single-thread application, they are equivalent.
The ratio means - L2 missed count is greater than 20, per 100 clocks, that is bad!
Regards, Peter
handlight
Beginner
81 Views

Thanks for your help :)
I checked the three groups of preset-sampling-events in the analyzer for three architecture: Core, Nehalem and Sandy Bridge. The group I can use is Nelalem, does it mean that my processor belong to Nelalem Family?
But I used none of them , I used the non-preset group.
Just like the L2_LINES_IN.SELF.ANY , I Failed to find the CPU_CLK_UNHALTED.CORE event. So I selected all events start with CPU_CLK_UNHALTED, and here is one of my test reslut:
CPU_CLK_UNHALTED.CORE.REF 339,310,000,000
CPU_CLK_UNHALTED.CORE.REF_P 17,770,400,000
CPU_CLK_UNHALTED.CORE.THREAD 356,690,000,000
CPU_CLK_UNHALTED.CORE.THREAD_P 355,938,000,000
CPU_CLK_UNHALTED.CORE.TOTAL_CYCLES 356,826,000,000
INST_RETIRED.ANY 149,382,000,000
L2_LINES_IN.ANY 3,950,000,000
So, what's the difference between the CPU_CLK_UNHALTED*s ?
TimP
Black Belt
81 Views

Yes, Xeon 5540 is a Nehalem CPU. If you meant what you seem to say about wanting L2 cache miss rate, you might simply add the L2 cache line miss retired event to the default events, no obvious reason why you would explore all the CPU_CLK events other than the one normally used. But maybe you don't mean L2 cache miss rate: L3 miss rate may be more significant if not as obvious how to collect.
Peter_W_Intel
Employee
81 Views

If you work on Nehalem Family processor - please focus on LLC miss, which has bigger penalty than L2.

Event ratio = ((MEM_LOAD_RETIRED.LLC_MISS * 180) / CPU_CLK_UNHALTED.THREAD) * 100

If ratio > 20%, consider to modify code to reduce LLC misses.

Regards, Peter