- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On skylake, m2pcie instead of r2pcie?
What is m2pcie?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
R2PCIe -> Ring 2 PCIe
M2PCIe -> Mesh 2 PCIe
But I'm wondering where you get the information about M2PCIe from. In the Uncore documentation https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-uncore-performance-monitoring-manual.html there is no M2PCIe unit. It is mentioned in the text and figures but there is no explanation of this unit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Page 117 says that the M2PCIe block translates between the Mesh and the various IO agents, so it is clear what the role is. It looks like plenty of data is available from the IIO and IRP blocks, so it is not clear what additional counters in the M2PCIe block would add...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I got M2PCIe from where you told me https://github.com/torvalds/linux/blob/master/arch/x86/events/intel/uncore_snbep.c.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What program will cause much more pcie transfer?
I have tried stream and linpack. It has little pcie transfer.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
PCIe provides the connections to IO devices, such as disk and network.
The standard version of STREAM runs on a single shared-memory node, so it is not expected to generate any PCIe traffic.
LINPACK comes in many different varieties. The "shared-memory" versions run on a single shared-memory node, so they are not expected to generate any PCIe traffic. The multi-node (MPI) versions (usually referred to as "HPL" (High Performance LINPACK)) will generate a modest amount of PCIe traffic when run on multiple nodes. It requires a deep understanding of the specific HPL implementation to predict the amount of data traffic required for a particular problem size.
PCIe traffic to disk requires a program that does disk reads and/or writes. This could be as simple as "cp largefile /dev/null", but you have to be careful to prevent disk caching in memory if you want to try this more than once.
PCIe traffic to the network requires a program that does network accesses. This could be as simple as "rcp largefile othernode:", or you could run one of the OSU MPI benchmarks between two nodes to generate higher data rates.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page