- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi~
I get"general exploration"of my code by using Amplifier XE. One of the report is LLC Miss. The whole LLC Miss of my code is 0.242, I comprehend it mean there are 24.2% cycles are uing towait read/store data from/to memory.
But,one function of my code's LLC Miss is 1.133, the other is 2.199!Ican't understand whythe rate can great than 1.
Is it because some event is not precise event? But why can it be 2.199? Anyone can tell me why?
And my CPU isCore microarchitecture, anyonecan tell mehow the LLC Miss is countedon Coremicroarchitecture?
A lot of thanks~~
I get"general exploration"of my code by using Amplifier XE. One of the report is LLC Miss. The whole LLC Miss of my code is 0.242, I comprehend it mean there are 24.2% cycles are uing towait read/store data from/to memory.
But,one function of my code's LLC Miss is 1.133, the other is 2.199!Ican't understand whythe rate can great than 1.
Is it because some event is not precise event? But why can it be 2.199? Anyone can tell me why?
And my CPU isCore microarchitecture, anyonecan tell mehow the LLC Miss is countedon Coremicroarchitecture?
A lot of thanks~~
Link Copied
5 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
First at all, let me assume that you are using VTune Amplifier XE 2011 product.
Are you using pre-defined analysis type - nehalem_memory-access which provides LLC Miss count? (You also can create a new analysis type for event MEM_LOAD_RETIRED.LLC_MISS...)
If so, the report provides MEM_LOAD_RETIRED.LLC_MISS count number. What did you meanfor "The whole LLC Miss of my code is 0.242"?
How to measure LLC miss which impact on performance in your code? See below formula:
3rd level misses: ((MEM_LOAD_RETIRED.LLC_MISS * 180) / CPU_CLK_UNHALTED.THREAD) * 100
If theresult (percentage) is significant than 20%, consider to improve code; Otherwise ignore LLC Miss in yourmodule / function.
Regards, Peter
First at all, let me assume that you are using VTune Amplifier XE 2011 product.
Are you using pre-defined analysis type - nehalem_memory-access which provides LLC Miss count? (You also can create a new analysis type for event MEM_LOAD_RETIRED.LLC_MISS...)
If so, the report provides MEM_LOAD_RETIRED.LLC_MISS count number. What did you meanfor "The whole LLC Miss of my code is 0.242"?
How to measure LLC miss which impact on performance in your code? See below formula:
3rd level misses: ((MEM_LOAD_RETIRED.LLC_MISS * 180) / CPU_CLK_UNHALTED.THREAD) * 100
If theresult (percentage) is significant than 20%, consider to improve code; Otherwise ignore LLC Miss in yourmodule / function.
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The penalty estimates for performance effects are only estimates; they can easily be off by as much as you have seen. For one thing, they don't take account many specific details of possible differences between your platform and application and those for which the estimate algorithms were derived.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Peter Wang:
Thanks for your reply.
I'm using VTune Amplifier XE 2011.
And my CPU is Core famlily, I used pre-defined analysis: Core 2 family-GeneralExploration.Soit doesn't hasevent MEM_LOAD_RETIRED.LLC_MISS. The General Exploration report LLC Miss directly. So the LLC Miss is count by VTune. And, I do not know how can count LLC Miss great than 1.....
In Core family which event is equal to Nehalem's MEM_LOAD_RETIRED.LLC_MISS? Is it L2_LINES_IN? OrMEM_LOAD_RETIRED.L2_LINE_MISS? I'm confused with these event....
About the formula:
3rd level misses: ((MEM_LOAD_RETIRED.LLC_MISS * 180) / CPU_CLK_UNHALTED.THREAD) * 100
I wonder how the '180'was counted?The '180' is mean that the latencyfor access memory is 180 cycles? Why the latency is 180?Is it anestimation number?BecauseIthinkthe latency is not only depend on CPU but also depend onthetype of memory, such as frequency, CLnumber. So itshould not be an constant number for deferent system.
And how to estimate the latency in Core family? It should be less than 180, right?
best regards, Huangzhe
Thanks for your reply.
I'm using VTune Amplifier XE 2011.
And my CPU is Core famlily, I used pre-defined analysis: Core 2 family-GeneralExploration.Soit doesn't hasevent MEM_LOAD_RETIRED.LLC_MISS. The General Exploration report LLC Miss directly. So the LLC Miss is count by VTune. And, I do not know how can count LLC Miss great than 1.....
In Core family which event is equal to Nehalem's MEM_LOAD_RETIRED.LLC_MISS? Is it L2_LINES_IN? OrMEM_LOAD_RETIRED.L2_LINE_MISS? I'm confused with these event....
About the formula:
3rd level misses: ((MEM_LOAD_RETIRED.LLC_MISS * 180) / CPU_CLK_UNHALTED.THREAD) * 100
I wonder how the '180'was counted?The '180' is mean that the latencyfor access memory is 180 cycles? Why the latency is 180?Is it anestimation number?BecauseIthinkthe latency is not only depend on CPU but also depend onthetype of memory, such as frequency, CLnumber. So itshould not be an constant number for deferent system.
And how to estimate the latency in Core family? It should be less than 180, right?
best regards, Huangzhe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi TimP:
Thanks for your reply.
I think your mean is the penalty that Vtune estimated is not the precise penalty for myplatform, so the LLC Missis also onlyan estimate number.So it can great than 1, right?
So if I want to know the precise LLC Miss rate, the improved way is write a price of code to count the precisepenalty, right?
best regards, Huangzhe
Thanks for your reply.
I think your mean is the penalty that Vtune estimated is not the precise penalty for myplatform, so the LLC Missis also onlyan estimate number.So it can great than 1, right?
So if I want to know the precise LLC Miss rate, the improved way is write a price of code to count the precisepenalty, right?
best regards, Huangzhe
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For Core 2 Duo processors, consider penalties -
Issue Performance Counter Approximate Penalty
L2 Miss MEM_LOAD_RETIRED.L2_LINE_MISS ~165 desktop/~300 server
See more from this article.
Issue Performance Counter Approximate Penalty
L2 Miss MEM_LOAD_RETIRED.L2_LINE_MISS ~165 desktop/~300 server
See more from this article.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page