What is the reason why the theoretical memory bandwidth of a Nehalem CPU (1333 * 3 channels * 64 bits / (8bits/byte) = 32GB/s per socket) is not similar to the measured bandwidth using the STREAM benchmark which is around 17GB/s per socket, or 37GB/s or so for a two-socket motherboard? I thought STREAM should give pretty close to the maximum throughput possible.
How can you calculate the memory bandwidth so it's closer to the observed result? I.e. what am I not taking into account.