There is no r2pcie on skylake.

GHui · ‎08-03-2017

On skylake, m2pcie instead of r2pcie?

What is m2pcie?

Thomas_G_4 · ‎08-03-2017

R2PCIe -> Ring 2 PCIe
M2PCIe -> Mesh 2 PCIe

But I'm wondering where you get the information about M2PCIe from. In the Uncore documentation https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-uncore-performance-monitoring-manual.html there is no M2PCIe unit. It is mentioned in the text and figures but there is no explanation of this unit

McCalpinJohn · ‎08-03-2017

Page 117 says that the M2PCIe block translates between the Mesh and the various IO agents, so it is clear what the role is. It looks like plenty of data is available from the IIO and IRP blocks, so it is not clear what additional counters in the M2PCIe block would add...

GHui · ‎08-04-2017

I got M2PCIe from where you told me https://github.com/torvalds/linux/blob/master/arch/x86/events/intel/uncore_snbep.c.

GHui · ‎09-06-2017

What program will cause much more pcie transfer?

I have tried stream and linpack. It has little pcie transfer.

McCalpinJohn · ‎09-06-2017

PCIe provides the connections to IO devices, such as disk and network.

The standard version of STREAM runs on a single shared-memory node, so it is not expected to generate any PCIe traffic.

LINPACK comes in many different varieties. The "shared-memory" versions run on a single shared-memory node, so they are not expected to generate any PCIe traffic. The multi-node (MPI) versions (usually referred to as "HPL" (High Performance LINPACK)) will generate a modest amount of PCIe traffic when run on multiple nodes. It requires a deep understanding of the specific HPL implementation to predict the amount of data traffic required for a particular problem size.

PCIe traffic to disk requires a program that does disk reads and/or writes. This could be as simple as "cp largefile /dev/null", but you have to be careful to prevent disk caching in memory if you want to try this more than once.

PCIe traffic to the network requires a program that does network accesses. This could be as simple as "rcp largefile othernode:", or you could run one of the OSU MPI benchmarks between two nodes to generate higher data rates.