Hi All,
I've been trying to measure PCIe performance on Intel(R) Xeon(R) Silver 4314 on Linux. I summarize my questions and post my log below. Any help is welcomed. I always run PCM as root.
I found some error messages when measuring, and the counters related to MMIO had no values.
Here is my log:
===== Processor information =====
Linux arch_perfmon flag : yes
Hybrid processor : no
IBRS and IBPB supported : yes
STIBP supported : yes
Spec arch caps supported : yes
Max CPUID level : 27
CPU family : 6
CPU model number : 106
Number of physical cores: 32
Number of logical cores: 64
Number of online logical cores: 64
Threads (logical cores) per physical core: 2
Num sockets: 2
Physical cores per socket: 16
Last level cache slices per socket: 16
Core PMU (perfmon) version: 5
Number of core PMU generic (programmable) counters: 8
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 4
Width of fixed counters: 48 bits
Nominal core frequency: 2400000000 Hz
IBRS enabled in the kernel : yes
STIBP enabled in the kernel : no
The processor is not susceptible to Rogue Data Cache Load: yes
The processor supports enhanced IBRS : yes
Package thermal spec power: 135 Watt; Package minimum power: 72 Watt; Package maximum power: 557 Watt;
ERROR: UPI LL monitoring device (0:7e:3:1) is missing. The UPI statistics will be incomplete or missing.
Socket 0: 4 memory controllers detected with total number of 8 channels. 2 UPI ports detected. 4 M2M (mesh to memory)/B2CMI blocks detected. 0 HBM M2M blocks detected. 0 EDC/HBM channels detected. 0 Home Agents detected. 3 M3UPI/B2UPI blocks detected.
ERROR: UPI LL monitoring device (0:fe:3:1) is missing. The UPI statistics will be incomplete or missing.
Socket 1: 4 memory controllers detected with total number of 8 channels. 2 UPI ports detected. 4 M2M (mesh to memory)/B2CMI blocks detected. 0 HBM M2M blocks detected. 0 EDC/HBM channels detected. 0 Home Agents detected. 3 M3UPI/B2UPI blocks detected.
Socket 0: 1 PCU units detected. 6 IIO units detected. 6 IRP units detected. 16 CHA/CBO units detected. 0 MDF units detected. 1 UBOX units detected. 0 CXL units detected. 0 PCIE_GEN5x16 units detected. 0 PCIE_GEN5x8 units detected.
Socket 1: 1 PCU units detected. 6 IIO units detected. 6 IRP units detected. 16 CHA/CBO units detected. 0 MDF units detected. 1 UBOX units detected. 0 CXL units detected. 0 PCIE_GEN5x16 units detected. 0 PCIE_GEN5x8 units detected.
Initializing RMIDs
Update every 1 seconds
=====print counters=====
Skt,PCIRdCur,ItoM,ItoMCacheNear,UCRdF,WiL,PCIe Rd (B),PCIe Wr (B)
0,76,30,14,0,16,4864,2816(Total)
0,16,30,6,0,16,1024,2304(Miss)
0,60,0,8,0,0,3840,512(Hit)
1,36440,221510,1463586,0,0,2332160,107846144(Total)
1,34032,219764,1200274,0,0,2178048,90882432(Miss)
1,2408,1746,263312,0,0,154112,16963712(Hit)
Skt,PCIRdCur,ItoM,ItoMCacheNear,UCRdF,WiL,PCIe Rd (B),PCIe Wr (B)
0,78,42,0,0,2,4992,2688(Total)
0,20,42,0,0,2,1280,2688(Miss)
0,58,0,0,0,0,3712,0(Hit)
1,288594,2532052,1410020,0,20,18470016,252292608(Total)
1,288588,2532052,1197416,0,20,18469632,238685952(Miss)
1,6,0,212604,0,0,384,13606656(Hit)
Skt,PCIRdCur,ItoM,ItoMCacheNear,UCRdF,WiL,PCIe Rd (B),PCIe Wr (B)
0,1104,38,200,0,14,70656,15232(Total)
0,52,38,70,0,14,3328,6912(Miss)
0,1052,0,130,0,0,67328,8320(Hit)
1,290306,2546964,1738480,0,80,18579584,274268416(Total)
1,290306,2546964,1392086,0,80,18579584,252099200(Miss)
1,0,0,346394,0,0,0,22169216(Hit)
As you can see, I use the pcie device on socket 1. There is correct PCIRd counter value and ItoM counter value. However, the counters of MMIO events(i.e. WiL) are extremely low(i.e. 0 and 20.
How can I use the PCM tool to measure counters of MMIO event in this type of machines?
Thank you in advance for your great help!
Sincerely,
Qiangsheng Su
連結已複製
Hi,
the software (e.g. drivers) should minimize the number of MMIO operations (i.e. WiL) because they are expensive. For example they are used for updating Tx/Rx tail pointers and network software should use larger buffers to make those expensive updates rare.
Roman
Hi,
Hello, I measured the data when using RDMA NIC to send data. I used RDMA NIC to send 100 Million 8B data in total. Normally, MMIO should be used, so I think the MMIO-related counter values that I measured are abnormal (i.e. 0).
Or, is there any way to know whether MMIO is used?
Thank you in advance for your great help!
Sincerely,
Qiangsheng Su
