Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4995 Discussions

DTLB Misses VS Cache(LLC) Misses count

Dny
Beginner
1,683 Views
Hello ,

I have an query about the number of DTLB miss count and Cache (LLC) Miss count.
As I understand the number of cache (LLC) miss count should be always greater or equal to the number of DTLB misses.
But for one of my test case of Binary search tree, I observed that, on Nehalem server the number of LLC miss count (MEM_LOAD_RETIRED.LLC_MISS) are less than DTLB miss count (MEM_LOAD_RETIRED.DTLB_MISS).
Please find the graph in attachment. ( I have plotted the number of events not number of samples)

Can you please give any suggestion on this behavior?

Thanking you,

Regards,
Dny
0 Kudos
9 Replies
TimP
Honored Contributor III
1,683 Views
It seems entirely possible to incur more DTLB misses in (1st level) data cache than misses in last level cache.
0 Kudos
Dny
Beginner
1,683 Views
Quoting - tim18
It seems entirely possible to incur more DTLB misses in (1st level) data cache than misses in last level cache.
Hello Sir,

Thanks for your reply,

I didn't understand your answer completely, Can you please explain in more detail?
As per understanding for each DTLB miss (whether 1 lever or 2nd level) there will be always a LLC miss, because accessing DTLB means data needs to be fetch from main memory as it not available on any cache. So LLC Miss should be greater than or equal to DTLB misses.

Is my understanding correct?

Thanking you,

Regards,
Dny

0 Kudos
Peter_W_Intel
Employee
1,683 Views

Hi Dny,

I suggest that you read article -http://assets.devx.com/goparallel/18027.pdf

L2 Miss MEM_LOAD_RETIRED.L2_LINE_MISS ~165 desktop/~300 server
L1 DTLB MissMEM_LOAD_RETIRED.DTLB_MISS ~10

For Intel? Core? i7 processors, measuring DTLB misses - read http://software.intel.com/en-us/articles/using-intel-vtune-performance-analyzer-to-optimize-software-for-the-intelr-coretm-i7-processor-family/

Estimate the impact of "TLB misses" ((DTLB_LOAD_MISSES.WALK_COMPLETED * 30) / CPU_CLK_UNHALTED.THREAD) * 100
>If impact is significant (> 5-10%), optimize functions with high DTLB misses

Hope it helps!

Regards, Peter
0 Kudos
Dny
Beginner
1,683 Views

Hi Dny,

I suggest that you read article -http://assets.devx.com/goparallel/18027.pdf

L2 Miss MEM_LOAD_RETIRED.L2_LINE_MISS ~165 desktop/~300 server
L1 DTLB Miss MEM_LOAD_RETIRED.DTLB_MISS ~10

For Intel? Core? i7 processors, measuring DTLB misses - read http://software.intel.com/en-us/articles/using-intel-vtune-performance-analyzer-to-optimize-software-for-the-intelr-coretm-i7-processor-family/

Estimate the impact of "TLB misses" ((DTLB_LOAD_MISSES.WALK_COMPLETED * 30) / CPU_CLK_UNHALTED.THREAD) * 100
>If impact is significant (> 5-10%), optimize functions with high DTLB misses

Hope it helps!

Regards, Peter

Hello Sir,

I already referred these two documents and it helped me a lot. These documents helps us to calcu;ate the impact of LLC, DTLB misses and how the CPU CLK Cycles are being used.

My query is regarding the total number of LLC misses and DTLB misses. I'm wondering why the total number of DTLB Misses are higher than than the LLC misses.

Thanking you,

Regards,
Dny.

0 Kudos
Vladimir_T_Intel
Moderator
1,683 Views
Probably this is due to situations: L1 Cache Hit, L1 DTLB Miss andL2 Cache Hit, L1 DTLB Miss. They are rare, but possible if the data iscirculatingaround.
0 Kudos
TimP
Honored Contributor III
1,683 Views
On earlier Intel CPUs, the in-cache DTLB miss was common enough, and handled poorly enough, to constitute a significant reason for performance loss. The capacity of DTLB covers only a small fraction of the last level cache capacity, soon to be reduced further on new models. Situations where attention to data locality may improve performance already become more frequent on 6 core CPUs.
0 Kudos
Dny
Beginner
1,683 Views
Probably this is due to situations: L1 Cache Hit, L1 DTLB Miss andL2 Cache Hit, L1 DTLB Miss. They are rare, but possible if the data iscirculatingaround.
Hello Sir,

How can be its possible to have L1 Cache Hit and L1 DTLB Miss?
Can you explain in more details?

I'm still finding out the exact cause of this behavior.

Thanking you,

Regards,
Dny

0 Kudos
Vladimir_T_Intel
Moderator
1,683 Views
Quoting - Dny
How can be its possible to have L1 Cache Hit and L1 DTLB Miss?
Can you explain in more details?

One of the examples I could imagine is acirculatingdata that fit into L1 cache but with high stride whichcausespage walking. It might be a corner case at the beginning of the cycle -- so, there could be a number of DTLB misses counted one event more than cache misses. In case of enough cycles it might become visible in results. I didn't try to reproduce it, though.
0 Kudos
bishgada
Beginner
1,683 Views
Quoting - Dny
Hello Sir,

How can be its possible to have L1 Cache Hit and L1 DTLB Miss?
Can you explain in more details?

I'm still finding out the exact cause of this behavior.

Thanking you,

Regards,
Dny


Could it be due to the fact that sampling is not accurate? It might be that the different counters are sampled at different times and that's the cause of the difference.

Guy.
0 Kudos
Reply