I get"general exploration"of my code by using Amplifier XE. One of the report is LLC Miss. The whole LLC Miss of my code is 0.242, I comprehend it mean there are 24.2% cycles are uing towait read/store data from/to memory. But,one function of my code's LLC Miss is 1.133, the other is 2.199!Ican't understand whythe rate can great than 1. Is it because some event is not precise event? But why can it be 2.199? Anyone can tell me why?
And my CPU isCore microarchitecture, anyonecan tell mehow the LLC Miss is countedon Coremicroarchitecture?
The penalty estimates for performance effects are only estimates; they can easily be off by as much as you have seen. For one thing, they don't take account many specific details of possible differences between your platform and application and those for which the estimate algorithms were derived.
And my CPU is Core famlily, I used pre-defined analysis: Core 2 family-GeneralExploration.Soit doesn't hasevent MEM_LOAD_RETIRED.LLC_MISS. The General Exploration report LLC Miss directly. So the LLC Miss is count by VTune. And, I do not know how can count LLC Miss great than 1.....
In Core family which event is equal to Nehalem's MEM_LOAD_RETIRED.LLC_MISS? Is it L2_LINES_IN? OrMEM_LOAD_RETIRED.L2_LINE_MISS? I'm confused with these event....
About the formula: 3rd level misses: ((MEM_LOAD_RETIRED.LLC_MISS * 180) / CPU_CLK_UNHALTED.THREAD) * 100 I wonder how the '180'was counted?The '180' is mean that the latencyfor access memory is 180 cycles? Why the latency is 180?Is it anestimation number?BecauseIthinkthe latency is not only depend on CPU but also depend onthetype of memory, such as frequency, CLnumber. So itshould not be an constant number for deferent system. And how to estimate the latency in Core family? It should be less than 180, right?
Thanks for your reply. I think your mean is the penalty that Vtune estimated is not the precise penalty for myplatform, so the LLC Missis also onlyan estimate number.So it can great than 1, right? So if I want to know the precise LLC Miss rate, the improved way is write a price of code to count the precisepenalty, right?