i have disabled one of and even both both the hardware prefetecher and the adjacent cache-line prefetch option (both through bios and through msr), but much to my surprise, i don't see any differences at all. i've tried both real applciations and synthetic ones, and now i'm starting to suspect something else is not ok.
my question is the following: what simple executable/benchmark should clearly demonstrate a difference? i have already tried the whole HPCC suite, and things link STREAM, randomaccess or hpl don't differ much (< 2%) between on or off, where i would expect otherwise.
Random access presumably is designed to defeat cache optimizations including prefetch. Adjacent sector prefetch is entirely likely to have little effect on normal HPC applications. I haven't looked into details of HPL. If you want to start from scratch, rather than consult those who may have analyzed it back when 5420 was a current product, you would look into details of cache behavior with hardware prefetch on and off by using a tool such as VTune. Of course, you should check that you are using affinity correctly, particularly if running OpenMP versions of STREAM, or an MPI which doesn't set affinity by default. In applications with which I'm familiar, hardware prefetch is most effective when the application bandwidth demand is lower than hardware maximum, as the prefetch generally increases bandwidth demand. This forum is primarily dedicated to Intel MPI related questions, although it may not be clear from the title, so you should be specific if you are looking for help outside that area.