I have an query about the number of DTLB miss count and Cache (LLC) Miss count. As I understand the number of cache (LLC) miss count should be always greater or equal to the number of DTLB misses. But for one of my test case of Binary search tree, I observed that, on Nehalem server the number of LLC miss count (MEM_LOAD_RETIRED.LLC_MISS) are less than DTLB miss count (MEM_LOAD_RETIRED.DTLB_MISS). Please find the graph in attachment. ( I have plotted the number of events not number of samples)
Can you please give any suggestion on this behavior?
It seems entirely possible to incur more DTLB misses in (1st level) data cache than misses in last level cache.
Thanks for your reply,
I didn't understand your answer completely, Can you please explain in more detail? As per understanding for each DTLB miss (whether 1 lever or 2nd level) there will be always a LLC miss, because accessing DTLB means data needs to be fetch from main memory as it not available on any cache. So LLC Miss should be greater than or equal to DTLB misses.
On earlier Intel CPUs, the in-cache DTLB miss was common enough, and handled poorly enough, to constitute a significant reason for performance loss. The capacity of DTLB covers only a small fraction of the last level cache capacity, soon to be reduced further on new models. Situations where attention to data locality may improve performance already become more frequent on 6 core CPUs.
How can be its possible to have L1 Cache Hit and L1 DTLB Miss? Can you explain in more details?
One of the examples I could imagine is acirculatingdata that fit into L1 cache but with high stride whichcausespage walking. It might be a corner case at the beginning of the cycle -- so, there could be a number of DTLB misses counted one event more than cache misses. In case of enough cycles it might become visible in results. I didn't try to reproduce it, though.