topic Michael, in Software Tuning, Performance Optimization & Platform Monitoring

Intel Performance Counter(pcm): Total QPI incoming data traffic ?

Prabhu_T_ — Fri, 16 Nov 2012 07:07:25 GMT

Hi,

I'm running PCM tool on the machine with Intel(R) Xeon(R) CPU X5690 @ 3.47GHz "Intel(r) microarchitecture codename Westmere-EP". The output of the PCM tool has the following information on QPI. Can you please share details on the below in the output, what does it actually mean ?

Total QPI incoming data traffic: 65 M QPI data traffic/Memory controller traffic: 0.84

And also please let us know how data cache misses affect QPI output value.

Thanks, Prabhu

Hi Prabhu,

Michael_Intel — Fri, 16 Nov 2012 13:56:12 GMT

Hi Prabhu, The QuickPath Interconnect (QPI) is the link between your two Xeon sockets. Your incoming QPI traffic means that 65 MB of data were transferred from the remote socket to the local socket. This data includes remote main memory (DRAM) access, as well as synchronization handshakes like the cache-coherency protocol. The QPI / Memory controller traffic ratio, shows you that about 80 % of your memory access was local. If you want to get more control of where memory is accessed, you might have a look at the numactl linux utility. Cache misses can affect the QPI, when the resulting memory access is remote or when the cache coherency protocol is involved. Regards, Michael Steyer

Michael,

Roman_D_Intel — Fri, 16 Nov 2012 18:31:01 GMT

Michael, thanks. There are a few things that need more detail. The incoming QPI data traffic only includes data payload but not the non-data overhead (snoops, coherency requests). The "data + non-data" QPI metric is supported on more recent processors. According to the definition the "QPI data traffic/Memory controller traffic" should be interpreted as follows: If all memory accesses are local then QPI data traffic should be negligible (zero). Then the metric is close to 0. If all memory accesses are remote then they must go through QPI, in this case QPI data traffic could be even >= memory controller traffic. The metric is then >= 1. The metric is an indicator for the NUMA-awareness of the applications running on the system. NUMA-optimized applications should have close to 0 metric value. -- Roman