Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.

What's the trigger condition of the L1-stride prefetcher


Intel hardware Prefetcher  Intel website shows that there are four kinds of hardware  prefeches. The prefetcher controlled by bit 3 is the L1 stride prefetcher. However I run a test code to test what's the trigger condition of the stride prefetcher.I run the code with following steps:repeat following for 10000 times:

  1. flush
  2. training phase: access line 0 3 6 9 for one time
  3. sleep for near 500 cycles
  4. measure phase: measure one line in the OS page for one time

However I can see only the line 0 3 6 9 is  hit in the cache. No stride prefetching activities can be observed even after I change the stride or the length of access pattern. So I wonder if there is no stride prefetcher in the Intel processor or there is some special trigger conditions? 

0 Kudos
3 Replies
Black Belt

This prefetcher is called the "IP prefetcher", which suggests that it operates based on the sequence of addresses accessed by a single load instruction in the executing code.  So the first step is to verify that your implementation is accessing memory in a loop so that all accesses are associated with the same load instruction.  The compiler will often unroll loops, which would spread your loads across multiple instruction pointers and most likely defeat the prefetcher for short sequences.

I have never tested this prefetcher myself, but there are some good comments on the L1 prefetchers at



Thanks for your reply. I do the memory access in a loop actually, so there is only one instruction for the memory access. After I check the post on stackoverflow, someone really introduce some new insight of the L1 prefetches. However, after checking and doing as what they said, I still can't see the L1 prefetcher activities for both L1 stride prefetcher or L1 DCU prefetcher. 


Also I have organised my code and attached, maybe any one who is interested in prefetch  can run it on you machine. Just run

sudo ./

is ok.  The result on my machine show that access time for line 12 is bigger than 180 cycles mostly. I think there is no problem with time measurement code because if I change the measured line from cache line 12 to cache line 6(just change it at test.c, line 103), then the access time is mostly 25 cycles.