I'm doing a binary search through large ordered data; while the loop is very tight, it stalls heavily on memory o evry iteration.
What are the good events to watch to see if my prefetching is doing any good?
Link Copied
It really depends on which processor architecture you are running on. For Core2 processors, check out this white paper, which describes the events to use for identifying various microarchitectural issues in your software. For the Corei7 family of processors, see this paper.
For more complete information about compiler optimizations, see our Optimization Notice.