R2PCIe -> Ring 2 PCIe
M2PCIe -> Mesh 2 PCIe
But I'm wondering where you get the information about M2PCIe from. In the Uncore documentation https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-uncore-performance-mo... there is no M2PCIe unit. It is mentioned in the text and figures but there is no explanation of this unit
Page 117 says that the M2PCIe block translates between the Mesh and the various IO agents, so it is clear what the role is. It looks like plenty of data is available from the IIO and IRP blocks, so it is not clear what additional counters in the M2PCIe block would add...
PCIe provides the connections to IO devices, such as disk and network.
The standard version of STREAM runs on a single shared-memory node, so it is not expected to generate any PCIe traffic.
LINPACK comes in many different varieties. The "shared-memory" versions run on a single shared-memory node, so they are not expected to generate any PCIe traffic. The multi-node (MPI) versions (usually referred to as "HPL" (High Performance LINPACK)) will generate a modest amount of PCIe traffic when run on multiple nodes. It requires a deep understanding of the specific HPL implementation to predict the amount of data traffic required for a particular problem size.
PCIe traffic to disk requires a program that does disk reads and/or writes. This could be as simple as "cp largefile /dev/null", but you have to be careful to prevent disk caching in memory if you want to try this more than once.
PCIe traffic to the network requires a program that does network accesses. This could be as simple as "rcp largefile othernode:", or you could run one of the OSU MPI benchmarks between two nodes to generate higher data rates.