Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

What is PCICFG address in uncore of intel XEON?

Zhu_G_
Beginner
2,076 Views

Hi community

I am reading XEON-E5-2600-UNCORE-GUIDE. and I got confused about the address column in the tables marked as PCICFG address.

What is PCICFG address and how can I transfer the address provided in the UNCORE-GUIDE to an address that I can use with by msr-tools?

 

0 Kudos
1 Solution
McCalpinJohn
Honored Contributor III
2,076 Views

The addressing for the uncore performance counters in the Xeon E5-2600 processors can certainly be confusing!

1. The easiest approach is to use the "lspci" command to look for the PCI addresses of the units with performance counters in PCI configuration space.   According to the documentation in the "Intel Xeon Processor E5-1600/2400/2600 (E5-Product Family) Product Families: Datasheet - Volume 2"  (Intel document 326509), the PCI "bus number" of the uncore devices in the Xeon E5 processor in socket zero of the system is reported in the PCI configuration space of bus 0, device 5, function 0, offset 0x109.   You can read this with the following command:

setpci -s 00:5.0 109.b

The output will be one of the following: 1f, 3f, 7f, ff

Although the documentation does not make this clear, the PCI "bus number" of the uncore devices in the Xeon E5 processor in socket 1 of the system is the next number in the sequence above.   Most of my systems use 7f and ff, but some use 3f and 7f.

2. The next piece of confusion is that the PCI Device numbers used in the Xeon E5-2600 uncore configuration guide are *decimal* values, while the "lspci" and "setpci" commands use hexadecimal numbering.  

The iMC Performance Monitors provide a good example.  Table 2-59 of the Xeon E5-2600 Uncore Performance Monitoring Guide lists the "MC Channel 0 PMON Registers" as using D16:F0.     This needs to be translated to "device" 10 (hex), function 0 (hex).  

3.  Combining 1 and 2 is best shown by example.   First I look up the "bus" used by socket 0:

# setpci -s 00:5.0 109.b
7f

Next print out the label for Device 10 (hex), Function 0:

# lspci -s 7f:10.0
7f:10.0 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 0 (rev 07)

Note that the text here might be completely different or even completely wrong on your system -- the important thing is that the device exists.

Next print out the contents of the first 256 Bytes --- this will include all of the "offset" addresses used by the performance monitor control and counting "registers"

$ lspci -xxx -s 7f:10.0
7f:10.0 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 0 (rev 07)
00: 86 80 b0 3c 00 00 10 00 07 00 80 08 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 18 05
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00
40: 10 00 91 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 11 23 10 00 00 00 00 00 02 7c 05 00 00 00 00 00
b0: d9 e5 0b 00 00 00 00 00 fb 81 02 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 92 fa 84 c7 36 2d 00 00 04 03 40 01 04 0c 40 01
e0: 01 00 40 01 02 01 40 01 00 00 00 00 00 00 00 00
f0: 00 00 40 00 00 00 01 00 00 00 00 00 00 00 00 00


The numbering of this "lspci" output is from left (offset 0) to right (offset 15), but remember that all multi-byte accesses are little-endian.   An easy example to see what this means is to look at the first two bytes:

 $ setpci -s 7f:10.0 0.w
8086


The 16-bit value "8086" (reversed from the first two values in the first line above) is a code indicating that this PCI device is an Intel product.

4. Now we can look at some specific memory controller performance counter control registers.  Table 2-59 says that MC_CH0_PCI_PMON_CTL0 (channel 0 performance counter control) is a 32-bit value at offset 0xD8.    These can be read from the output above ("04 03 40 01") or by

$ setpci -s 7f:10.0 d8.l
01400304

This clearly gives the same bits in the expected order for a little-endian 32-bit word.

Going to Table 2-61, this 32-bit field can be expanded to see the current programming of the counter:

thresh     31:24       0x01    <-- threshold used in counter comparison

invert         23           0x0     <-- invert flag is not set

en              22           0x1    <-- enable bit is set

(rsv)         21:19        0x0    <-- reserved bits are correctly set to zero

edge_det   18           0x0    <-- edge detect bit is not enabled

(rsv)         17:16        0x0   <-- reserved bits are correctly set to zero

umask      15:8        0x03   <-- umask = 0x03 --> RD sub-event of CAS_COUNT

ev_sel        7:0        0x04   <-- event = 0x04 --> CAS_COUNT

All of the other counters based in PCI configuration space are analogous.

 

To program the counters, you can use "setpci" (as root) or you can open the PCI device driver files.  Using the same examples as above, the device driver is at "/proc/bus/pci/7f/10.0".    This file can be opened (with root permissions) and the "pread()" and "pwrite" commands can be used to read and write binary data into the device driver with the correct offsets.

 

View solution in original post

0 Kudos
7 Replies
McCalpinJohn
Honored Contributor III
2,077 Views

The addressing for the uncore performance counters in the Xeon E5-2600 processors can certainly be confusing!

1. The easiest approach is to use the "lspci" command to look for the PCI addresses of the units with performance counters in PCI configuration space.   According to the documentation in the "Intel Xeon Processor E5-1600/2400/2600 (E5-Product Family) Product Families: Datasheet - Volume 2"  (Intel document 326509), the PCI "bus number" of the uncore devices in the Xeon E5 processor in socket zero of the system is reported in the PCI configuration space of bus 0, device 5, function 0, offset 0x109.   You can read this with the following command:

setpci -s 00:5.0 109.b

The output will be one of the following: 1f, 3f, 7f, ff

Although the documentation does not make this clear, the PCI "bus number" of the uncore devices in the Xeon E5 processor in socket 1 of the system is the next number in the sequence above.   Most of my systems use 7f and ff, but some use 3f and 7f.

2. The next piece of confusion is that the PCI Device numbers used in the Xeon E5-2600 uncore configuration guide are *decimal* values, while the "lspci" and "setpci" commands use hexadecimal numbering.  

The iMC Performance Monitors provide a good example.  Table 2-59 of the Xeon E5-2600 Uncore Performance Monitoring Guide lists the "MC Channel 0 PMON Registers" as using D16:F0.     This needs to be translated to "device" 10 (hex), function 0 (hex).  

3.  Combining 1 and 2 is best shown by example.   First I look up the "bus" used by socket 0:

# setpci -s 00:5.0 109.b
7f

Next print out the label for Device 10 (hex), Function 0:

# lspci -s 7f:10.0
7f:10.0 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 0 (rev 07)

Note that the text here might be completely different or even completely wrong on your system -- the important thing is that the device exists.

Next print out the contents of the first 256 Bytes --- this will include all of the "offset" addresses used by the performance monitor control and counting "registers"

$ lspci -xxx -s 7f:10.0
7f:10.0 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 0 (rev 07)
00: 86 80 b0 3c 00 00 10 00 07 00 80 08 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 28 10 18 05
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00
40: 10 00 91 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 11 23 10 00 00 00 00 00 02 7c 05 00 00 00 00 00
b0: d9 e5 0b 00 00 00 00 00 fb 81 02 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 92 fa 84 c7 36 2d 00 00 04 03 40 01 04 0c 40 01
e0: 01 00 40 01 02 01 40 01 00 00 00 00 00 00 00 00
f0: 00 00 40 00 00 00 01 00 00 00 00 00 00 00 00 00


The numbering of this "lspci" output is from left (offset 0) to right (offset 15), but remember that all multi-byte accesses are little-endian.   An easy example to see what this means is to look at the first two bytes:

 $ setpci -s 7f:10.0 0.w
8086


The 16-bit value "8086" (reversed from the first two values in the first line above) is a code indicating that this PCI device is an Intel product.

4. Now we can look at some specific memory controller performance counter control registers.  Table 2-59 says that MC_CH0_PCI_PMON_CTL0 (channel 0 performance counter control) is a 32-bit value at offset 0xD8.    These can be read from the output above ("04 03 40 01") or by

$ setpci -s 7f:10.0 d8.l
01400304

This clearly gives the same bits in the expected order for a little-endian 32-bit word.

Going to Table 2-61, this 32-bit field can be expanded to see the current programming of the counter:

thresh     31:24       0x01    <-- threshold used in counter comparison

invert         23           0x0     <-- invert flag is not set

en              22           0x1    <-- enable bit is set

(rsv)         21:19        0x0    <-- reserved bits are correctly set to zero

edge_det   18           0x0    <-- edge detect bit is not enabled

(rsv)         17:16        0x0   <-- reserved bits are correctly set to zero

umask      15:8        0x03   <-- umask = 0x03 --> RD sub-event of CAS_COUNT

ev_sel        7:0        0x04   <-- event = 0x04 --> CAS_COUNT

All of the other counters based in PCI configuration space are analogous.

 

To program the counters, you can use "setpci" (as root) or you can open the PCI device driver files.  Using the same examples as above, the device driver is at "/proc/bus/pci/7f/10.0".    This file can be opened (with root permissions) and the "pread()" and "pwrite" commands can be used to read and write binary data into the device driver with the correct offsets.

 

0 Kudos
GioFari
Beginner
1,935 Views

@McCalpinJohn this solution works only for some un-core performance counter? I m trying to use the un-core M2M and the offset is 0x260 and it not belong to the classical CONFIGURATION SPACE PCI (the maximum should be 256). So I was thinking to access through memory of device, but Base Address Registers (BAR) contains 0 value. So I think it refers to Extended Configuration space available with PCI Express and other (the command to see the extended area: lspci -xxxx)

 

Is this the right way? Otherwise what I have to do?

"Intel® Xeon® Processor Scalable Memory Family Uncore Performance Monitoring" 

GioFari_0-1619022403931.png

 

0 Kudos
McCalpinJohn
Honored Contributor III
1,922 Views

The "0x2066" in that table is the PCIe Device ID (DID), which is a 16-bit quantity immediately following the 16-bit Vendor ID (VID) at the beginning of the configuration space for each device.  The VID/DID pair should be used for verifying that you are working with the correct device before you write to any other bits.

From the command line, you can see the VID and DID in several ways, such as:

$ setpci -s 3a:08.0 0x0.w
8086
$ setpci -s 3a:08.0 0x2.w
2066

or

 $ lspci -xx -s 3a:08.0
3a:08.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
00: 86 80 66 20 00 00 10 00 07 00 80 08 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00

The VID:DID pair is especially useful for finding out which buses the devices are on -- the procedure for mapping from what the documentation calls "bus2" or "bus3" to an actual bus number on a particular system is very confusing.  Instead, you can simply run lspci and ask it to list the devices with the VID:DID pair that you are interested in:

$ lspci -d 8086:2066
3a:08.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
3a:09.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
ae:08.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)
ae:09.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 07)

The offsets into PCI configuration space for those devices are in the next line of the table -- e.g., 

  • Unit Status is at offset 0x260
  • Counter 0 is at offset 0x200 (low-order 32-bits, with the high-order bits implicitly located at 0x204)
  • PerfEvtSel 0 is at offset 0x228

You are correct that these are in "extended PCI configuration space" (4KiB).  The BIOS will block access to extended PCI configuration space for some devices -- this is necessary to prevent the user from accidentally changing things like the mapping of physical addresses to DRAM channel, rank, and bank on a live system.  Unfortunately you will sometimes run across a case in which the BIOS blocks access to extended PCI configuration space for performance counters for some unit(s).  In that case it may still be possible to access the extended space by physical address via /dev/mem.

0 Kudos
GioFari
Beginner
1,914 Views

Thanks for your time @McCalpinJohn , however, while I didn't find problem for classical PCI configuration space, I m finding a lot of problem to modify  "extended PCI configuration space".

I m working in kernel space with a kernel module, so I m not focusing on solutions in usr space.

If I use the same function that I used for the classical config space (int pci_read_config_byte(const struct pci_dev *dev, int where, u8 *val); and int pci_write_config_byte(const struct pci_dev *dev, int where, u8 val);) the read function continue to work in the extended area while the write not!!!

At this point I tried with MMIO, I discover the physical address and I applied a remap of physical memory to the virtual memory of process, however with an error.

Let me know if you have some references to hint me, or directly un example with code.

 

 

 

0 Kudos
McCalpinJohn
Honored Contributor III
1,911 Views

The code is ugly, but the code "perf_counters.c" at periodic-performance-counters uses mmap() on /dev/mem to access performance counters in PCI configuration space.  

  • The mmap() of /dev/mem happens at lines 954-976
  • A test for the processor model is at lines 1012-1027.   If it does not find the expected VID:DID for a Skylake Xeon it will abort.
  • The program reads from a text file and writes the requested values to the IMC PerfEvtSel registers in lines 1307-1321
  • The values from the IMC counters are read at lines 555-573

The code should also be checking the VID:DID before writing to the IMC registers, but the earlier check at lines 1012-1027 is sufficient for all of the machines that I have tested....

0 Kudos
Bernard
Valued Contributor I
2,075 Views
0 Kudos
Zhu_G_
Beginner
2,075 Views

Thanks Dr. B

I stuck at step 3, when I typed the cmd lspci -xxx -s 7f:10.0, here is what I got from my machine:

7f:10.0 System peripheral: Intel Corporation Xeon E5/Core i7 Integrated Memory Controller Channel 0-3 Thermal Control 0 (rev 07)
00: 86 80 b0 3c 00 00 10 00 07 00 80 08 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00
30: 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00

Obviously I have something missing on my machine. I think it may be my motherboard.

I am considering buying new server board such as s2600CWT, or use whole system product such as DELL PowerEdge R730.

But I don't know if my job is available on these machine, I contacted the sales but they don't know about this. what should I do?

0 Kudos
Reply