I was googling around for this with no luck. In "memory bandwidth" analysis of vtune, using these
events, program memory bandwidth usage timeline is available, However, aggregatememory bandwidth value of whole program is not there.
I was to trying to figure out hwo to get this. My guess is
(OFFCORE_RESPONSE.LLC_MISS.LOCAL_DRAM_0 +OFFCORE_RESPONSE.LLC_MISS.LOCAL_DRAM_1) * 64 / , which gives ~24GB/s, which is not possible on my Xeon E3 -1225 cpu, the max BW is 21GB/s. STREAM benchmark gives only ~14GB/s for a 4 threads setup. There must something wrong with above formula. My question is: what is the correct formula to get memory bandwidth on sandy bridge?