Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

disabling hardware prefetching on Xeon

ramacn
Beginner
2,519 Views
Hi,
I was trying to do some experiments tounderstand the effect of hardware prefetcher on software prefetching.
I had a question related to h/w prefetcher on Intel Xeon processor,and was wondering if any of you have some suggestions.

My machine configuration:
$ uname -a
Linux 2.6.34-gentoo-r1 #2 SMP Mon Feb 20 12:07:35 CST 2012 x86_64
Intel Xeon CPU E5520 @ 2.27GHz GenuineIntel GNU/Linux

I am trying to count the number of hardware prefetch events (a.k.a.L1D_PREFETCH:REQUESTS perfmon2 event), before and after disablinghardware prefetching using BIOS. I have followed the steps mentionedin the below link to disable h/w prefetching in BIOS.
http://software.intel.com/en-us/articles/optimizing-application-performance-on-intel-coret-microarchitecture-using-hardware-implemented-prefetchers/

However even after disabling the h/w prefetching, I don't see it ishaving any effect on prefetching event L1D_PREFETCH:REQUESTS. I hadposted this query on perfmon2 group, and I was told that since theprocessor E5520 is not a Core micro-architecture but Nehalem-EP thetechnique above to disable HW prefetcher does not apply.

Do any of you have any suggestions about how we can disable h/w prefetcher on this architecture?
Regards,
Ram
0 Kudos
1 Solution
Patrick_F_Intel1
Employee
2,519 Views
Hello Ram,
When I want to prove to myself that the prefetchers are disabled, here is what I do:
1) get a program that measures memory latency (using a dependent chain, load-to-use algorithm)
2) the program needs to let you specify a 64 byte stride and an array size (like 40MBs)
a) the 64 byte stride will trigger the prefetcher if it is enabled
b) the 40MB array size will be > the last level cache size so you will be testing memory latency.
c) some programs don't let you specify the stride and array size but they may automatically generate the 64 byte stride, size > LLC case anyway.

Below is data for the single-threaded memory latency case.
If the prefetchers are off then you will see the full memory latency. For instance, on my Westmere based laptop, this is about 281 clockticks (or 111 nanoseconds) per miss.
With the prefetchers enabled, the miss latency is about 34 clockticks (or 13 nanoseconds) per miss.

So you can see the prefetcher has a dramatic impact on latency.

On linux, probably the lmbench program does the latency test.
On windows, the cpu-z program has a latency test that runs over various sizes. See http://www.cpuid.com/medias/files/softwares/misc/latency.zip
In my utilities I use code based on Calibrator (but heavily modified). See http://homepages.cwi.nl/~manegold/Calibrator/
Pat

View solution in original post

0 Kudos
3 Replies
Patrick_F_Intel1
Employee
2,519 Views
Hello ramanc,
The perfmon2 group is correct. The URL you reference is for core architecture and E5520 processor is Nehalem-based architecture.
Usually you can enable/disable the prefetchers using the BIOS boot options.
Look for something in the BIOS labeled hardware prefetch (probably under the CPU-related section of the BIOS screens).
Hope this helps,
Pat
0 Kudos
ramacn
Beginner
2,519 Views
Hi Pat,
Thanks for the quick response.
This is exactly what I tried. While booting I got into BIOS and found an option to disable h/w prefetcher, which I disabled.
However even after disabling it, I could see the number ofL1D_PREFETCH:REQUESTS perfmon2 events remains pretty much the same. I ran a couple of openmp benchmarks and verified that disabling it in BIOS did not have any effect on this event. It also did not cause any impact on performance.
Hence I was wondering if h/w prefetching option from BIOS was effective or not. Or if there is a better way to disable it.
Also if there is a better way to verify it is actually disabled. Kindly let me know.
Regards,
Ram
0 Kudos
Patrick_F_Intel1
Employee
2,520 Views
Hello Ram,
When I want to prove to myself that the prefetchers are disabled, here is what I do:
1) get a program that measures memory latency (using a dependent chain, load-to-use algorithm)
2) the program needs to let you specify a 64 byte stride and an array size (like 40MBs)
a) the 64 byte stride will trigger the prefetcher if it is enabled
b) the 40MB array size will be > the last level cache size so you will be testing memory latency.
c) some programs don't let you specify the stride and array size but they may automatically generate the 64 byte stride, size > LLC case anyway.

Below is data for the single-threaded memory latency case.
If the prefetchers are off then you will see the full memory latency. For instance, on my Westmere based laptop, this is about 281 clockticks (or 111 nanoseconds) per miss.
With the prefetchers enabled, the miss latency is about 34 clockticks (or 13 nanoseconds) per miss.

So you can see the prefetcher has a dramatic impact on latency.

On linux, probably the lmbench program does the latency test.
On windows, the cpu-z program has a latency test that runs over various sizes. See http://www.cpuid.com/medias/files/softwares/misc/latency.zip
In my utilities I use code based on Calibrator (but heavily modified). See http://homepages.cwi.nl/~manegold/Calibrator/
Pat
0 Kudos
Reply