Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5140 Discussions

Ratio Limits for the Intel Core 2 and Intel Core i7

jjoker2k
Beginner
483 Views
Hello,

I'm studying performance event ratios of various programs on Intel Core 2 and Intel Core i7 computers using VTune. I was wondering why, in the help pages of VTune (under VTune Performance Analyzer Reference -> Processor Events and Advice), there are good and bad limits of ratios concerning only Intel Core and NOT Intel Core 2 nor Intel Core i7. Here's an example for Intel Core:

DTLB Miss Rate

DTLB misses / Instructions Retired

High value for this ratio indicates that the data your code accesses, spreads over many pages within a short time. TLB misses may yield L2 misses. Overall penalty is incurred over the original load or store operation.

Limits: good < 0.02, bad > 0.05


And here's an example for Intel Core 2:

DTLB Miss Rate due to Stores

Equation: DTLB_MISSES.MISS_ST / INST_RETIRED.ANY

Category: L1 Data Cache and DTLB Ratios; Address Translation Ratios; Ratios for Tuning Assistant Advice;

Definition: A high value for this ratio indicates that the code accesses too many data pages within a short time, and causes many Data TLB misses due to store operations. These misses can impact performance if they do not occur in parallel to other instructions. In addition, if there are many stores in a row, some of them missing the DTLB, it may cause stalls due to full store buffer.


Do you know if there is a place where I can find more information about the limits of these ratios on Intel Core 2 and Intel Core i7? I have already searched inside the Intel manuals (Intel 64 and IA-32 volume 1,2,3 and the optimization manual) with no luck. Plus, quoting what I pasted above, what does "A high value for this ratio..." mean? What is the reference value above which another value must be considered a "high value"?

Thank you very much in advance for your kind help,

Daniele
0 Kudos
3 Replies
TimP
Honored Contributor III
483 Views
A "high value" is one which would be likely to impact performance. That depends a great deal on your application. For Core i7, a DTLB miss causes L1 and L2 data to be invalidated and refreshed from L3. On Core 2, you would want to distinguish between L1 and L2 DTLB misses. There, the L1 DTLB miss would not be as serious as on Core i7. You can look for these events associated with DTLB miss and judge whether they impact performance of your application.
0 Kudos
jjoker2k
Beginner
483 Views
Thank you for your reply tim18, still I am wondering why there were reference values for the Intel Core ratios on the help pages of Vtune but no reference values for the Intel Core 2 or i7 ratios...
Quoting - tim18
A "high value" is one which would be likely to impact performance. That depends a great deal on your application. For Core i7, a DTLB miss causes L1 and L2 data to be invalidated and refreshed from L3. On Core 2, you would want to distinguish between L1 and L2 DTLB misses. There, the L1 DTLB miss would not be as serious as on Core i7. You can look for these events associated with DTLB miss and judge whether they impact performance of your application.

0 Kudos
TimP
Honored Contributor III
483 Views
Quoting - jjoker2k
Thank you for your reply tim18, still I am wondering why there were reference values for the Intel Core ratios on the help pages of Vtune but no reference values for the Intel Core 2 or i7 ratios...

I;ve seen the following hints for Core 2 Clovertown:
DTLB misses/100 instructions retired
good: < 0.01
bad: > 0.10

Those are overall numbers, say for an application which has only one type of hot spot. I guess they were mostly L1 DTLB misses.
I doubt enough applications have been surveyed to mention similar numbers for Nehalem.
Given that L1 DTLB misses have more than double the impact on Core i7, I would think the limits would be lower. Evidently, no one felt able to quote such things as recommendations for VTune; the .pdf advice on VTune for Core i7 just came out this week.
0 Kudos
Reply