Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Questions regarding measuring MMIO event using PCM

qiangsheng_su
Beginner
457 Views

Hi All,

I've been trying to measure PCIe performance on Intel(R) Xeon(R) Silver 4314 on Linux. I summarize my questions and post my log below. Any help is welcomed. I always run PCM as root.

I found some error messages when measuring, and the counters related to MMIO had no values.

Here is my log:

===== Processor information =====
Linux arch_perfmon flag : yes
Hybrid processor : no
IBRS and IBPB supported : yes
STIBP supported : yes
Spec arch caps supported : yes
Max CPUID level : 27
CPU family : 6
CPU model number : 106
Number of physical cores: 32
Number of logical cores: 64
Number of online logical cores: 64
Threads (logical cores) per physical core: 2
Num sockets: 2
Physical cores per socket: 16
Last level cache slices per socket: 16
Core PMU (perfmon) version: 5
Number of core PMU generic (programmable) counters: 8
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 4
Width of fixed counters: 48 bits
Nominal core frequency: 2400000000 Hz
IBRS enabled in the kernel : yes
STIBP enabled in the kernel : no
The processor is not susceptible to Rogue Data Cache Load: yes
The processor supports enhanced IBRS : yes
Package thermal spec power: 135 Watt; Package minimum power: 72 Watt; Package maximum power: 557 Watt;

ERROR: UPI LL monitoring device (0:7e:3:1) is missing. The UPI statistics will be incomplete or missing.
Socket 0: 4 memory controllers detected with total number of 8 channels. 2 UPI ports detected. 4 M2M (mesh to memory)/B2CMI blocks detected. 0 HBM M2M blocks detected. 0 EDC/HBM channels detected. 0 Home Agents detected. 3 M3UPI/B2UPI blocks detected.
ERROR: UPI LL monitoring device (0:fe:3:1) is missing. The UPI statistics will be incomplete or missing.
Socket 1: 4 memory controllers detected with total number of 8 channels. 2 UPI ports detected. 4 M2M (mesh to memory)/B2CMI blocks detected. 0 HBM M2M blocks detected. 0 EDC/HBM channels detected. 0 Home Agents detected. 3 M3UPI/B2UPI blocks detected.
Socket 0: 1 PCU units detected. 6 IIO units detected. 6 IRP units detected. 16 CHA/CBO units detected. 0 MDF units detected. 1 UBOX units detected. 0 CXL units detected. 0 PCIE_GEN5x16 units detected. 0 PCIE_GEN5x8 units detected.
Socket 1: 1 PCU units detected. 6 IIO units detected. 6 IRP units detected. 16 CHA/CBO units detected. 0 MDF units detected. 1 UBOX units detected. 0 CXL units detected. 0 PCIE_GEN5x16 units detected. 0 PCIE_GEN5x8 units detected.
Initializing RMIDs

Update every 1 seconds

 

=====print counters=====

Skt,PCIRdCur,ItoM,ItoMCacheNear,UCRdF,WiL,PCIe Rd (B),PCIe Wr (B)
0,76,30,14,0,16,4864,2816(Total)
0,16,30,6,0,16,1024,2304(Miss)
0,60,0,8,0,0,3840,512(Hit)
1,36440,221510,1463586,0,0,2332160,107846144(Total)
1,34032,219764,1200274,0,0,2178048,90882432(Miss)
1,2408,1746,263312,0,0,154112,16963712(Hit)

Skt,PCIRdCur,ItoM,ItoMCacheNear,UCRdF,WiL,PCIe Rd (B),PCIe Wr (B)
0,78,42,0,0,2,4992,2688(Total)
0,20,42,0,0,2,1280,2688(Miss)
0,58,0,0,0,0,3712,0(Hit)
1,288594,2532052,1410020,0,20,18470016,252292608(Total)
1,288588,2532052,1197416,0,20,18469632,238685952(Miss)
1,6,0,212604,0,0,384,13606656(Hit)

Skt,PCIRdCur,ItoM,ItoMCacheNear,UCRdF,WiL,PCIe Rd (B),PCIe Wr (B)
0,1104,38,200,0,14,70656,15232(Total)
0,52,38,70,0,14,3328,6912(Miss)
0,1052,0,130,0,0,67328,8320(Hit)
1,290306,2546964,1738480,0,80,18579584,274268416(Total)
1,290306,2546964,1392086,0,80,18579584,252099200(Miss)
1,0,0,346394,0,0,0,22169216(Hit)

As you can see, I use the pcie device on socket 1. There is correct PCIRd counter value and ItoM counter value. However, the counters of MMIO events(i.e. WiL) are extremely low(i.e. 0 and 20.

How can I use the PCM tool to measure counters of MMIO event in this type of machines?

Thank you in advance for your great help!

Sincerely,

Qiangsheng Su

0 Kudos
2 Replies
Roman_D_Intel
Employee
378 Views

Hi,

 

the software (e.g. drivers) should minimize the number of MMIO operations (i.e. WiL) because they are expensive. For example they are used for updating Tx/Rx tail pointers and network software should use larger buffers to make those expensive updates rare.

 

Roman

0 Kudos
qiangsheng_su
Beginner
329 Views

Hi,

 

Hello, I measured the data when using RDMA NIC to send data. I used RDMA NIC to send 100 Million 8B data in total. Normally, MMIO should be used, so I think the MMIO-related counter values that ​​I measured are abnormal (i.e. 0).
Or, is there any way to know whether MMIO is used?

 

Thank you in advance for your great help!

Sincerely,

Qiangsheng Su

0 Kudos
Reply