Using the stream code, and Intel's MLC, I observe a large difference in reported memory bandwidth.
stream reports - Copy: 13041, Scale: 12850, Add: 14436, Triad: 14340
Intel's MLC reports - ALL Reads: 75823, 3:1 Reads-Writes: 74216, 2:1 Reads-Writes: 73818, 1:1 Reads-Writes: 69407 and Stream-triad like: 70701.
So the MLC Stream-triad like value is 70.7GB/sec, versus stream triad 14.3GB/sec.
I am curious to understand the difference. Is it because of concurrency? MLC spawns several threads. I did not compile stream with OpenMP, so it is executing as a single thread.
Thanks and best regards
Yes, the primary difference is concurrency.
The size of the difference will depend on the system under test (both the physical configuration (model, #sockets, #DIMMS/channel) and the BIOS configuration (snooping mode, memory redundancy mode, etc)) and on how STREAM was compiled and run.