Here's some info we got the last time someone asked for measuring bus bandwidth. It is offered "as-is". That is, don't ask me to explain it, cuz I can't!
Here it is digested for the read case, which is more relevant to performance tuning. For the load and store case you would replace the following events:
Bus Reads Underway From The Processor -> Bus Accesses Underway From The Processor
Reads From The Processor -> Bus Accesses From The Processor
The peak bandwidth for an Intel Pentium 4 processor with an 800 MHz FSB is:
64 bits/sec * 1byte/8bits * 800 cycles/sec = 6.4G bytes/second
To measure consumed memory read bandwidth for Write Back cacheable memory on a 2GHz processor we would measure using VTune analyze:
(Reads From The Processor * 64 bytes * 3G cycles/sec) / (Bus Reads Underway From The Processor with compare bit set)
Breaking this down gives:
Number of bytes the workload read from memory:
Reads From The Processor (read transactions) * 64 bytes/transaction = read bytes all transactions
Average number of seconds that read transactions are underway on the bus:
Bus Reads Underway From The Processor with compare bit set (transactions cycles) / 3G cycles/sec = transactions sec
HINT: Use VTune analyzer to edit the Bus Reads Underway From The Processor event and set the compare bit in the edit event dialog.