Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

TLB misses

cagribal
Beginner
589 Views
Hello,
I'm trying to measure TLB misses with the following counters:
DTLB_LOAD_MISSES.ANY
MEM_LOAD_RETIRED.DTLB_MISS
The second one gives more misses than the first one. And also the first one gives more misses (approximately 2 times) than the expected misses. What can be the possible reasons? Is the first one counting 2 times per miss for first level miss and second level miss? The machine I'm using is Xeon L5520. Any help is appreciated.
Cheers,
0 Kudos
3 Replies
Patrick_F_Intel1
Employee
589 Views
Hello cagribal,
I ran a test to check the counters.
The test is a 'read memory bandwidth' test.
I start 1 thread/cpu and each thread reads a 40MB array using a 64 byte stride for 10 seconds.
I would expect 1 DTLB miss per page. Each page is 4096 bytes.
It takes 64 loads to cover a page (64 loads = 4096 page size/64 stride).

Here is what I counted in 1 of the 10 seconds.

[cpp]DTLB_LOAD_MISSES.ANY 556,938 556,991 560,425 556,418 MEM_LOAD_RETIRED.DTLB_MISS 532,471 513,618 524,524 526,461 MEM_INST_RETIRED.LOADS 37,694,658 38,354,887 36,165,843 34,506,563 UNC_LLC_LINES_IN.ANY 133,367,850 DTLB_MISSES.WALK_COMPLETED 558,674 558,890 565,344 559,734 a. loads/DTLB_miss, row3/row1 67.68 68.86 64.53 62.02 b. loads/DTLB_miss, row3/row2 70.79 74.68 68.95 65.54 c. loads/DTLB_miss, row3/row5 67.47 68.63 63.97 61.65 d. LLC_misses/DTLB_miss, row4/sum(row2) 63.60 e. loads/LLC_miss, sum(row3)/row4 1.10 [/cpp]
The raw data is in rows 1-5.
I compute how many loads/DTLB_miss in rows a-d.
The loads/dtlb_miss is close to the expected 64. I ran the test on my work laptop which has tons of stuff running on it.
Row d. shows the LLC (Last level cache) misses / dtlb_miss. This is very close to 64 and is probably the best measure (since most of the LLC misses are due to my read memory bw test case).

So... in conclusion... I don't see overcounting. Certainly not 2x times too many DTLB misses.
Can you tell us more about your expected count and methodology?

Pat
0 Kudos
GHui
Novice
589 Views
Hello Pat,

The test confused me. There is much I can't understand.

[cpp]a. loads/DTLB_miss, row3/row1 67.68 68.86 64.53 62.02 b. loads/DTLB_miss, row3/row2 70.79 74.68 68.95 65.54 c. loads/DTLB_miss, row3/row5 67.47 68.63 63.97 61.65 [/cpp]
The first column is the same, but the second column is different. I don't know why is this.

And what's the test's program? could I get the code? I want to do it by myself.

GHui
0 Kudos
Patrick_F_Intel1
Employee
589 Views
Hello GHui,
Yeah, the table is not so clear.
Here is what it should look like:
[cpp] core0 core1 core2 core3 a. loads/DTLB_miss(row3/row1) 67.68 68.86 64.53 62.02 b. loads/DTLB_miss(row3/row2) 70.79 74.68 68.95 65.54 c. loads/DTLB_miss(row3/row5) 67.47 68.63 63.97 61.65 [/cpp]
So all 3 rows are "loads/DTLB_misses" but computed from different quantities.

The test program is my 'id_cpu' utility. I don't have approval to release it.
But it should be relatively easy to reproduce the results with any 64 byte stride (justtouch eachcache line), read memory, with a 40 MB array.
Pat
0 Kudos
Reply