- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I'm trying to measure TLB misses with the following counters:

DTLB_LOAD_MISSES.ANY

MEM_LOAD_RETIRED.DTLB_MISS

The second one gives more misses than the first one. And also the first one gives more misses (approximately 2 times) than the expected misses. What can be the possible reasons? Is the first one counting 2 times per miss for first level miss and second level miss? The machine I'm using is Xeon L5520. Any help is appreciated.

Cheers,

Link Copied

3 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

I ran a test to check the counters.

The test is a 'read memory bandwidth' test.

I start 1 thread/cpu and each thread reads a 40MB array using a 64 byte stride for 10 seconds.

I would expect 1 DTLB miss per page. Each page is 4096 bytes.

It takes 64 loads to cover a page (64 loads = 4096 page size/64 stride).

Here is what I counted in 1 of the 10 seconds.

[cpp]DTLB_LOAD_MISSES.ANY 556,938 556,991 560,425 556,418 MEM_LOAD_RETIRED.DTLB_MISS 532,471 513,618 524,524 526,461 MEM_INST_RETIRED.LOADS 37,694,658 38,354,887 36,165,843 34,506,563 UNC_LLC_LINES_IN.ANY 133,367,850 DTLB_MISSES.WALK_COMPLETED 558,674 558,890 565,344 559,734 a. loads/DTLB_miss, row3/row1 67.68 68.86 64.53 62.02 b. loads/DTLB_miss, row3/row2 70.79 74.68 68.95 65.54 c. loads/DTLB_miss, row3/row5 67.47 68.63 63.97 61.65 d. LLC_misses/DTLB_miss, row4/sum(row2) 63.60 e. loads/LLC_miss, sum(row3)/row4 1.10 [/cpp]

The raw data is in rows 1-5.

I compute how many loads/DTLB_miss in rows a-d.

The loads/dtlb_miss is close to the expected 64. I ran the test on my work laptop which has tons of stuff running on it.

Row d. shows the LLC (Last level cache) misses / dtlb_miss. This is very close to 64 and is probably the best measure (since most of the LLC misses are due to my read memory bw test case).

So... in conclusion... I don't see overcounting. Certainly not 2x times too many DTLB misses.

Can you tell us more about your expected count and methodology?

Pat

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

The test confused me. There is much I can't understand.

[cpp]a. loads/DTLB_miss, row3/row1 67.68 68.86 64.53 62.02 b. loads/DTLB_miss, row3/row2 70.79 74.68 68.95 65.54 c. loads/DTLB_miss, row3/row5 67.47 68.63 63.97 61.65 [/cpp]

The first column is the same, but the second column is different. I don't know why is this.

And what's the test's program? could I get the code? I want to do it by myself.

GHui

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Yeah, the table is not so clear.

Here is what it should look like:

[cpp] core0 core1 core2 core3 a. loads/DTLB_miss(row3/row1) 67.68 68.86 64.53 62.02 b. loads/DTLB_miss(row3/row2) 70.79 74.68 68.95 65.54 c. loads/DTLB_miss(row3/row5) 67.47 68.63 63.97 61.65 [/cpp]

So all 3 rows are "loads/DTLB_misses" but computed from different quantities.

The test program is my 'id_cpu' utility. I don't have approval to release it.

But it should be relatively easy to reproduce the results with any 64 byte stride (justtouch eachcache line), read memory, with a 40 MB array.

Pat

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page