Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

There is no r2pcie on skylake.

GHui
Novice
625 Views

 

On skylake, m2pcie instead of r2pcie? 

What is m2pcie?

 

0 Kudos
5 Replies
Thomas_G_4
New Contributor II
625 Views

R2PCIe -> Ring 2 PCIe
M2PCIe -> Mesh 2 PCIe

But I'm wondering where you get the information about M2PCIe from. In the Uncore documentation https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-uncore-performance-monitoring-manual.html there is no M2PCIe unit. It is mentioned in the text and figures but there is no explanation of this unit

0 Kudos
McCalpinJohn
Honored Contributor III
625 Views

Page 117 says that the M2PCIe block translates between the Mesh and the various IO agents, so it is clear what the role is.  It looks like plenty of data is available from the IIO and IRP blocks, so it is not clear what additional counters in the M2PCIe block would add...

0 Kudos
GHui
Novice
625 Views
0 Kudos
GHui
Novice
625 Views

What program will cause much more pcie transfer?

I have tried stream and linpack. It has little pcie transfer.

0 Kudos
McCalpinJohn
Honored Contributor III
625 Views

PCIe provides the connections to IO devices, such as disk and network.

The standard version of STREAM runs on a single shared-memory node, so it is not expected to generate any PCIe traffic.

LINPACK comes in many different varieties.  The "shared-memory" versions run on a single shared-memory node, so they are not expected to generate any PCIe traffic.  The multi-node (MPI) versions (usually referred to as "HPL" (High Performance LINPACK)) will generate a modest amount of PCIe traffic when run on multiple nodes.  It requires a deep understanding of the specific HPL implementation to predict the amount of data traffic required for a particular problem size.

PCIe traffic to disk requires a program that does disk reads and/or writes.   This could be as simple as "cp largefile /dev/null", but you have to be careful to prevent disk caching in memory if you want to try this more than once.

PCIe traffic to the network requires a program that does network accesses.  This could be as simple as "rcp largefile othernode:", or you could run one of the OSU MPI benchmarks between two nodes to generate higher data rates.

0 Kudos
Reply