Intel's MLC utility was used to compute inter-CPU bandwidth e.g. how many bytes/second can an Intel processor move between different core's L2/3 CPU cache. See :
4×16-core Xeon E5-4660, Broadwell, 2.2 GHz 77 Gb/sec
4×8-core Xeon E5-4620, Sandy Bridge-EP, 2.2 GHz 57Gb/sec
4×8-core Xeon E7-4820, Westmere-EX, 2.0 GHz 47 Gb/sec
Unfortunately the paper  does not describe which mlc arguments were used to compute this nor are the authors responding to emails. As far as I can tell the right way to compute bandwidth is to run "mlc --c2c_latency" and derive bandwidth from latency?
 "ffwd: delegation is (much) faster than you think" https://www.seltzer.com/margo/teaching/CS508-generic/papers-a1/roghanchi17.pdf pg347 Table1
Correction from author:
>Intel's MLC utility was used to compute inter-CPU bandwidth
This should have read "intra-CPU" bandwidth e.g. moving data between processor caches in the same socket.
Thank you for posting on the Intel® communities.
In this case, please bear in mind that we don’t provide any formal support for the Intel® Memory Latency Checker. This software is distributed via our Intel Developer Zone and is provided “as is,” without any commitment for support services; however, users like yourself may ask questions on the Software Tuning, Performance Optimization & Platform Monitoring forum to which we have moved your question. You might be able to receive the assistance needed here from the community peers who are familiar with your situation.
Intel Technical Support Technician