Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.

Nehalem Prefetcher

How do I turn off the prefetcher, it is messing with my results? I need to show that one paging scheme results in less cache misses than another.
I have an 8MB cache. My for-loop will repeatedly read 16MB from RAM.The second 8MB should eject the first 8MB from the cache so that when I repeat the loop all 16MB will show cache misses. I measure the number of L3 cache misses before and after each iteration of the loop. For the first iteration I measure 16MB worth of cache misses. However, every other iteration only shows me 2 L3 cache misses. Since I know that I am reading 16MB of data each iteration i expect to see more than just 2 L3 cache misses. I assume that the prefetcher is doing its job and prefetching the data before I need it, but this messes with the results and I cannot show that one technique is better than the other. How do I turn off the prefetcher on the nehalem i7-860 processor.
0 Kudos
2 Replies

Often, you can disable the L2 prefetchers in the BIOS. (This depends on the BIOS.) However, you will still have have to live with L1 prefetchers.

Instead of disabling the prefetchers, you might want to try accessing the data in a random scheme that the prefetcher is unable to detect. For example, you could follow a linked list that randomly jumps around in memory.

Unfortunately, i have tried the linked list method where an address's data points to another address whose data points to another. This method does not work at all, the prefetcher can completely predict all memory accessess.

I have an address point to a random address which also points to a random address. The addresses are page aligned because the prefetcher loads a full page into cache. I then run through the following assembly loop:

mov $4096, %%rcx //this will cause the loop to repeat 4096 times effectively reading 16MB
mov $1073741824, %%rax //starting address is 0x40000000
mov (%%rax), %rax //linked list traversal
loop ap1_loop

when I execute the previous code I only get two cache misses even though I am reading 16MB of data.

It seems that when I sequentially read through 16MB of addresses I get 16MB worth of cache misses the first time but no other time. To test my design I need to also get 16MB worth of cache misses the second time running the loop. Since my cache is 8MB and I am reading 16MB of data I expect to the cache misses.

I am using Intel's BIOS and it does not have the option to disable the prefetcher. According to Intel personnel from the Software Virtualization Development Forum there is no "publicly available" method to disable the nehalem prefetcher. If you know of a method I would gladly try.