Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
1672 Discussions

Intel Performance Counter(pcm): Total QPI incoming data traffic ?



I'm running PCM tool on the machine with Intel(R) Xeon(R) CPU X5690 @ 3.47GHz "Intel(r) microarchitecture codename Westmere-EP". The output of the PCM tool has the following information on QPI. Can you please share details on the below in the output, what does it actually mean ? 

Total QPI incoming data traffic:   65 M     QPI data traffic/Memory controller traffic: 0.84

And also please let us know how data cache misses affect QPI output value.

Thanks, Prabhu

0 Kudos
2 Replies
Hi Prabhu, The QuickPath Interconnect (QPI) is the link between your two Xeon sockets. Your incoming QPI traffic means that 65 MB of data were transferred from the remote socket to the local socket. This data includes remote main memory (DRAM) access, as well as synchronization handshakes like the cache-coherency protocol. The QPI / Memory controller traffic ratio, shows you that about 80 % of your memory access was local. If you want to get more control of where memory is accessed, you might have a look at the numactl linux utility. Cache misses can affect the QPI, when the resulting memory access is remote or when the cache coherency protocol is involved. Regards, Michael Steyer
0 Kudos
Michael, thanks. There are a few things that need more detail. The incoming QPI data traffic only includes data payload but not the non-data overhead (snoops, coherency requests). The "data + non-data" QPI metric is supported on more recent processors. According to the definition the "QPI data traffic/Memory controller traffic" should be interpreted as follows: If all memory accesses are local then QPI data traffic should be negligible (zero). Then the metric is close to 0. If all memory accesses are remote then they must go through QPI, in this case QPI data traffic could be even >= memory controller traffic. The metric is then >= 1. The metric is an indicator for the NUMA-awareness of the applications running on the system. NUMA-optimized applications should have close to 0 metric value. -- Roman
0 Kudos