- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Hello ,
I have an query about the number of DTLB miss count and Cache (LLC) Miss count.
As I understand the number of cache (LLC) miss count should be always greater or equal to the number of DTLB misses.
But for one of my test case of Binary search tree, I observed that, on Nehalem server the number of LLC miss count (MEM_LOAD_RETIRED.LLC_MISS) are less than DTLB miss count (MEM_LOAD_RETIRED.DTLB_MISS).
Please find the graph in attachment. ( I have plotted the number of events not number of samples)
Can you please give any suggestion on this behavior?
Thanking you,
Regards,
Dny
I have an query about the number of DTLB miss count and Cache (LLC) Miss count.
As I understand the number of cache (LLC) miss count should be always greater or equal to the number of DTLB misses.
But for one of my test case of Binary search tree, I observed that, on Nehalem server the number of LLC miss count (MEM_LOAD_RETIRED.LLC_MISS) are less than DTLB miss count (MEM_LOAD_RETIRED.DTLB_MISS).
Please find the graph in attachment. ( I have plotted the number of events not number of samples)
Can you please give any suggestion on this behavior?
Thanking you,
Regards,
Dny
コピーされたリンク
9 返答(返信)
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
It seems entirely possible to incur more DTLB misses in (1st level) data cache than misses in last level cache.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Quoting - tim18
It seems entirely possible to incur more DTLB misses in (1st level) data cache than misses in last level cache.
Thanks for your reply,
I didn't understand your answer completely, Can you please explain in more detail?
As per understanding for each DTLB miss (whether 1 lever or 2nd level) there will be always a LLC miss, because accessing DTLB means data needs to be fetch from main memory as it not available on any cache. So LLC Miss should be greater than or equal to DTLB misses.
Is my understanding correct?
Thanking you,
Regards,
Dny
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Hi Dny,
I suggest that you read article -http://assets.devx.com/goparallel/18027.pdf
L2 Miss MEM_LOAD_RETIRED.L2_LINE_MISS ~165 desktop/~300 server
L1 DTLB MissMEM_LOAD_RETIRED.DTLB_MISS ~10
For Intel? Core? i7 processors, measuring DTLB misses - read http://software.intel.com/en-us/articles/using-intel-vtune-performance-analyzer-to-optimize-software-for-the-intelr-coretm-i7-processor-family/
Estimate the impact of "TLB misses" ((DTLB_LOAD_MISSES.WALK_COMPLETED * 30) / CPU_CLK_UNHALTED.THREAD) * 100
>If impact is significant (> 5-10%), optimize functions with high DTLB misses
Hope it helps!
Regards, Peter
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Quoting - Peter Wang (Intel)
Hi Dny,
I suggest that you read article -http://assets.devx.com/goparallel/18027.pdf
L2 Miss MEM_LOAD_RETIRED.L2_LINE_MISS ~165 desktop/~300 server
L1 DTLB Miss MEM_LOAD_RETIRED.DTLB_MISS ~10
For Intel? Core? i7 processors, measuring DTLB misses - read http://software.intel.com/en-us/articles/using-intel-vtune-performance-analyzer-to-optimize-software-for-the-intelr-coretm-i7-processor-family/
Estimate the impact of "TLB misses" ((DTLB_LOAD_MISSES.WALK_COMPLETED * 30) / CPU_CLK_UNHALTED.THREAD) * 100
>If impact is significant (> 5-10%), optimize functions with high DTLB misses
Hope it helps!
Regards, Peter
Hello Sir,
I already referred these two documents and it helped me a lot. These documents helps us to calcu;ate the impact of LLC, DTLB misses and how the CPU CLK Cycles are being used.
My query is regarding the total number of LLC misses and DTLB misses. I'm wondering why the total number of DTLB Misses are higher than than the LLC misses.
Thanking you,
Regards,
Dny.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Probably this is due to situations: L1 Cache Hit, L1 DTLB Miss andL2 Cache Hit, L1 DTLB Miss. They are rare, but possible if the data iscirculatingaround.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
On earlier Intel CPUs, the in-cache DTLB miss was common enough, and handled poorly enough, to constitute a significant reason for performance loss. The capacity of DTLB covers only a small fraction of the last level cache capacity, soon to be reduced further on new models. Situations where attention to data locality may improve performance already become more frequent on 6 core CPUs.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Quoting - Vladimir Tsymbal (Intel)
Probably this is due to situations: L1 Cache Hit, L1 DTLB Miss andL2 Cache Hit, L1 DTLB Miss. They are rare, but possible if the data iscirculatingaround.
How can be its possible to have L1 Cache Hit and L1 DTLB Miss?
Can you explain in more details?
I'm still finding out the exact cause of this behavior.
Thanking you,
Regards,
Dny
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Quoting - Dny
How can be its possible to have L1 Cache Hit and L1 DTLB Miss?
Can you explain in more details?
Can you explain in more details?
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Quoting - Dny
Hello Sir,
How can be its possible to have L1 Cache Hit and L1 DTLB Miss?
Can you explain in more details?
I'm still finding out the exact cause of this behavior.
Thanking you,
Regards,
Dny
How can be its possible to have L1 Cache Hit and L1 DTLB Miss?
Can you explain in more details?
I'm still finding out the exact cause of this behavior.
Thanking you,
Regards,
Dny
Could it be due to the fact that sampling is not accurate? It might be that the different counters are sampled at different times and that's the cause of the difference.
Guy.
