Link Copied
Here is the answer. If you take a look at the VTune help for BUS_REQUEST_OUTSTANDING, it says: "The event counts only full-line cacheable read requests from either the L1 data cache or the L2 prefetchers." So, the big number in your case can be explained that the latencies caused by prefetcher were also counted.
As for the statement regarding BUS_REQUEST_OUTSTANDING event for Core2 micro architecture in the Intel 64 and IA-32 Optimization Reference Manual, it's not accurate. Use the MEM_LOAD_RETIRED.L2_LINE_MISS event instead.
Here is the answer. If you take a look at the VTune help for BUS_REQUEST_OUTSTANDING, it says: "The event counts only full-line cacheable read requests from either the L1 data cache or the L2 prefetchers." So, the big number in your case can be explained that the latencies caused by prefetcher were also counted.
As for the statement regarding BUS_REQUEST_OUTSTANDING event for Core2 micro architecture in the Intel 64 and IA-32 Optimization Reference Manual, it's not accurate. Use the MEM_LOAD_RETIRED.L2_LINE_MISS event instead.
For more complete information about compiler optimizations, see our Optimization Notice.