- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello ,
I have an query about the number of DTLB miss count and Cache (LLC) Miss count.
As I understand the number of cache (LLC) miss count should be always greater or equal to the number of DTLB misses.
But for one of my test case of Binary search tree, I observed that, on Nehalem server the number of LLC miss count (MEM_LOAD_RETIRED.LLC_MISS) are less than DTLB miss count (MEM_LOAD_RETIRED.DTLB_MISS).
Please find the graph in attachment. ( I have plotted the number of events not number of samples)
Can you please give any suggestion on this behavior?
Thanking you,
Regards,
Dny
I have an query about the number of DTLB miss count and Cache (LLC) Miss count.
As I understand the number of cache (LLC) miss count should be always greater or equal to the number of DTLB misses.
But for one of my test case of Binary search tree, I observed that, on Nehalem server the number of LLC miss count (MEM_LOAD_RETIRED.LLC_MISS) are less than DTLB miss count (MEM_LOAD_RETIRED.DTLB_MISS).
Please find the graph in attachment. ( I have plotted the number of events not number of samples)
Can you please give any suggestion on this behavior?
Thanking you,
Regards,
Dny
Link Copied
9 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems entirely possible to incur more DTLB misses in (1st level) data cache than misses in last level cache.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - tim18
It seems entirely possible to incur more DTLB misses in (1st level) data cache than misses in last level cache.
Thanks for your reply,
I didn't understand your answer completely, Can you please explain in more detail?
As per understanding for each DTLB miss (whether 1 lever or 2nd level) there will be always a LLC miss, because accessing DTLB means data needs to be fetch from main memory as it not available on any cache. So LLC Miss should be greater than or equal to DTLB misses.
Is my understanding correct?
Thanking you,
Regards,
Dny
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dny,
I suggest that you read article -http://assets.devx.com/goparallel/18027.pdf
L2 Miss MEM_LOAD_RETIRED.L2_LINE_MISS ~165 desktop/~300 server
L1 DTLB MissMEM_LOAD_RETIRED.DTLB_MISS ~10
For Intel? Core? i7 processors, measuring DTLB misses - read http://software.intel.com/en-us/articles/using-intel-vtune-performance-analyzer-to-optimize-software-for-the-intelr-coretm-i7-processor-family/
Estimate the impact of "TLB misses" ((DTLB_LOAD_MISSES.WALK_COMPLETED * 30) / CPU_CLK_UNHALTED.THREAD) * 100
>If impact is significant (> 5-10%), optimize functions with high DTLB misses
Hope it helps!
Regards, Peter
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Peter Wang (Intel)
Hi Dny,
I suggest that you read article -http://assets.devx.com/goparallel/18027.pdf
L2 Miss MEM_LOAD_RETIRED.L2_LINE_MISS ~165 desktop/~300 server
L1 DTLB Miss MEM_LOAD_RETIRED.DTLB_MISS ~10
For Intel? Core? i7 processors, measuring DTLB misses - read http://software.intel.com/en-us/articles/using-intel-vtune-performance-analyzer-to-optimize-software-for-the-intelr-coretm-i7-processor-family/
Estimate the impact of "TLB misses" ((DTLB_LOAD_MISSES.WALK_COMPLETED * 30) / CPU_CLK_UNHALTED.THREAD) * 100
>If impact is significant (> 5-10%), optimize functions with high DTLB misses
Hope it helps!
Regards, Peter
Hello Sir,
I already referred these two documents and it helped me a lot. These documents helps us to calcu;ate the impact of LLC, DTLB misses and how the CPU CLK Cycles are being used.
My query is regarding the total number of LLC misses and DTLB misses. I'm wondering why the total number of DTLB Misses are higher than than the LLC misses.
Thanking you,
Regards,
Dny.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Probably this is due to situations: L1 Cache Hit, L1 DTLB Miss andL2 Cache Hit, L1 DTLB Miss. They are rare, but possible if the data iscirculatingaround.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On earlier Intel CPUs, the in-cache DTLB miss was common enough, and handled poorly enough, to constitute a significant reason for performance loss. The capacity of DTLB covers only a small fraction of the last level cache capacity, soon to be reduced further on new models. Situations where attention to data locality may improve performance already become more frequent on 6 core CPUs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Vladimir Tsymbal (Intel)
Probably this is due to situations: L1 Cache Hit, L1 DTLB Miss andL2 Cache Hit, L1 DTLB Miss. They are rare, but possible if the data iscirculatingaround.
How can be its possible to have L1 Cache Hit and L1 DTLB Miss?
Can you explain in more details?
I'm still finding out the exact cause of this behavior.
Thanking you,
Regards,
Dny
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Dny
How can be its possible to have L1 Cache Hit and L1 DTLB Miss?
Can you explain in more details?
Can you explain in more details?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Dny
Hello Sir,
How can be its possible to have L1 Cache Hit and L1 DTLB Miss?
Can you explain in more details?
I'm still finding out the exact cause of this behavior.
Thanking you,
Regards,
Dny
How can be its possible to have L1 Cache Hit and L1 DTLB Miss?
Can you explain in more details?
I'm still finding out the exact cause of this behavior.
Thanking you,
Regards,
Dny
Could it be due to the fact that sampling is not accurate? It might be that the different counters are sampled at different times and that's the cause of the difference.
Guy.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page