Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Understanding PCICFG space information

Jaeyoung__Choi
Beginner
8,448 Views

Hi,

I have some difficulty in understanding PCICFG space information.

I was trying to read available CHA count and as I am using Xeon Gold 6132 processor, my expectation is 14.

I referred 

https://software.intel.com/en-us/download/intel-xeon-processor-scalable-memory-family-uncore-performance-monitoring-reference-manual

and I realize that I need to read device 30 , function 3 , offset 0x9c but bus number was not informed.

So I wrote this code,

#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
#include <asm/io.h>


const u32 PCI_ENABLE_BIT = 0x80000000;
const u32 PCI_CONFIG_ADDRESS = 0xCF8;
const u32 PCI_CONFIG_DATA = 0xCFC;


u32 r_pci_32(u8 bus, u8 device , u8 func , u8 offset){
        u32 ret;
        outl(PCI_ENABLE_BIT | (bus <<16 ) | (device <<11) | (func << 8) | (offset & 0xff) , PCI_CONFIG_ADDRESS);
        ret = inl(PCI_CONFIG_DATA);

        return ret;
}

static __init int init_pcilist(void){
        u8 bus ;
        u32 data;

        for(bus = 0 ; bus != 0xff ; bus ++){
                        data = r_pci_32(bus,30,3,0x9c);
                        printk(KERN_INFO "bus %d, device %d, func %d : value= 0x%08x\n" ,bus,30,3,data);
        }

        return 0;
}


static __exit void exit_pcilist(void){
        return;
}

module_init(init_pcilist);
module_exit(exit_pcilist);

and when I executed this code, most of the bus value was 0xffffffff

But only bus 23 and bus 133 value reported 0x03da1725 , 0x04ab64f4 respectively.

This value match what I expected but I wonder this is right approach and I am getting value properly

Finally If the my approach is right, Where can I get exact bus number??

This datasheet informs exact bus number

https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/xeon-e5-v2-datasheet-vol-2.pdf

 

0 Kudos
1 Solution
McCalpinJohn
Honored Contributor III
8,448 Views

Intel's documentation refers to PCI device numbers in decimal, while (approximately) everyone else in the universe uses hexadecimal values....

Device 30 (decimal) is 1E hex, so I just looked for devices with this number and a defined function 3.

$ lspci | grep :1e.3
17:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)
85:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)

The CAPID6 register is then available with

$ setpci -s 17:1e.3 0x9c.l
0fffffff

$ setpci -s 85:1e.3 0x9c.l
0fffffff

This is the correct value of the bitmap for this processor (Xeon Platinum 8280), since all 28 L3/CHA slices are enabled.

The other useful approach is to look for the PCI Device ID (DID) for the unit you are looking for.  These are listed in Table 1-13 of the reference above.  For example, the UPI Link Layer devices all have DID 0x2058, so they can easily be found by

$ lspci -d :2058
5d:0e.0 Performance counters: Intel Corporation Device 2058 (rev 07)
5d:0f.0 Performance counters: Intel Corporation Device 2058 (rev 07)
5d:10.0 Performance counters: Intel Corporation Device 2058 (rev 07)
d7:0e.0 Performance counters: Intel Corporation Device 2058 (rev 07)
d7:0f.0 Performance counters: Intel Corporation Device 2058 (rev 07)
d7:10.0 Performance counters: Intel Corporation Device 2058 (rev 07)

 

 

View solution in original post

0 Kudos
14 Replies
McCalpinJohn
Honored Contributor III
8,449 Views

Intel's documentation refers to PCI device numbers in decimal, while (approximately) everyone else in the universe uses hexadecimal values....

Device 30 (decimal) is 1E hex, so I just looked for devices with this number and a defined function 3.

$ lspci | grep :1e.3
17:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)
85:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)

The CAPID6 register is then available with

$ setpci -s 17:1e.3 0x9c.l
0fffffff

$ setpci -s 85:1e.3 0x9c.l
0fffffff

This is the correct value of the bitmap for this processor (Xeon Platinum 8280), since all 28 L3/CHA slices are enabled.

The other useful approach is to look for the PCI Device ID (DID) for the unit you are looking for.  These are listed in Table 1-13 of the reference above.  For example, the UPI Link Layer devices all have DID 0x2058, so they can easily be found by

$ lspci -d :2058
5d:0e.0 Performance counters: Intel Corporation Device 2058 (rev 07)
5d:0f.0 Performance counters: Intel Corporation Device 2058 (rev 07)
5d:10.0 Performance counters: Intel Corporation Device 2058 (rev 07)
d7:0e.0 Performance counters: Intel Corporation Device 2058 (rev 07)
d7:0f.0 Performance counters: Intel Corporation Device 2058 (rev 07)
d7:10.0 Performance counters: Intel Corporation Device 2058 (rev 07)

 

 

0 Kudos
Newman__Chuck
Novice
8,276 Views

How do I find out what parts of each CHA are "alive," i.e., not fused out?  What I really want to know is what tile is a particular core in.
In my HPE ProLiant DL380 Gen10 server with (2) 33 MiB/12 core Intel Xeon Gold 6256 processors I see the following:

# lscpu | grep -E -e ' per ' -e 'Model name:'
Thread(s) per core: 1
Core(s) per socket: 12
Model name: Intel(R) Xeon(R) Gold 6256 CPU @ 3.60GHz
L3 cache: 33792K
# for Value in $(lspci | awk '/^..:1e.3/{print $1}'); do echo 2 o 16 i $(setpci -s ${Value} 0x9c.l | tr [:lower:] [:upper:]) p | dc; done
111110111111111111101110111
111111111011111111111110110

Counting the bits in those fields I get 24 CHAs for each processor.  At 1.375 MiB each, that agrees with this processor having 33 MiB of cache.

My question, then, is how can I tell which of these 24 tiles contributes a core?

For bonus points, how does one tell if it's a 28core part, an 18core part, or a 10core part?  Finding the highest set-bit position in the mask will give a lower bound on the number of physical CHAs in a part (e.g., if the mask is E73 then we know its not a 10core part, but that doesn't say if its 28 or 18).

0 Kudos
Newman__Chuck
Novice
8,270 Views

I gave a bad example, as it had only 8 bits set; it should have been something like:

(e.g., if the mask is EF73 then we know its not a 10core part, but that doesn't say if its 28 or 18).

We can probably make these assumptions, though:

if ((coreCount <= 10) AND (cacheMiB <= 13.75)) then siliconSize=10Core
else If ((coreCount <= 18) AND (cacheMiB <= 24.75)) then siliconSize=18Core
else If ((coreCount <= 28) AND (cacheMiB <= 35.75)) then siliconSize=28Core
else IHaveNoIdeaWhatThisIs
0 Kudos
winkzy
Beginner
7,784 Views

You can read "CAPID4 (Chop)" to get the siliconsize info.

Table 1 in 2nd-gen-xeon-scalable-spec-update  file shows the corresponding relationship.

"B:1, D:30 F:3, O:94" means "Bus:1 Device:30 Function:3 Offset:94"

You can get info from bits [7:6], for example,

Xeon 4210: "00" means "LCC"

Xeon 4210R: "10" means "HCC"
Xeon 6230: "11" means "XCC"

[root@Node0 ~]# lscpu | grep Xeon
Model name:            Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz
[root@Node0 ~]# setpci -s 17:1e.3 0x94.l
24000e01

[root@Node1 ~]# lscpu | grep Xeon
Model name:            Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz
[root@Node1 ~]# setpci -s 17:1e.3 0x94.l
24000e81

[root@Node2 ~]# lscpu | grep Xeon
Model name:            Intel(R) Xeon(R) Gold 6230 CPU @ 2.10GHz
[root@Node2 ~]# setpci -s 17:1e.3 0x94.l
24000ec0

 


@Newman__Chuck wrote:

I gave a bad example, as it had only 8 bits set; it should have been something like:

(e.g., if the mask is EF73 then we know its not a 10core part, but that doesn't say if its 28 or 18).

We can probably make these assumptions, though:

if ((coreCount <= 10) AND (cacheMiB <= 13.75)) then siliconSize=10Core
else If ((coreCount <= 18) AND (cacheMiB <= 24.75)) then siliconSize=18Core
else If ((coreCount <= 28) AND (cacheMiB <= 35.75)) then siliconSize=28Core
else IHaveNoIdeaWhatThisIs

0 Kudos
McCalpinJohn
Honored Contributor III
8,263 Views

The CAPID6 register only indicates the active CHA/SF/LLC slices.  In Xeon E5 v4 (Broadwell) there was a corresponding CAPID5 register that showed the active cores, but I have not seen this documented for SKX/CLX processors.

The die size can be obtained from some other PCI configuration space registers.  This is documented in the section "Component Identification via Programming Interface" section of the "Specification Update" document.  Intel document 336065 for Xeon Scalable gen 1, document 338848 for Xeon Scalable gen 2.

I have recently published two technical reports that cover very closely related material.

 

0 Kudos
Newman__Chuck
Novice
8,243 Views

So no way on SkyLake/Cascade Lake to discern which CHAs present cores and which present only LLC; that's unfortunate.

I note that you gave the same URL for both of the two technical reports you cited; it's the one for "Observations on Core Numbering and Core ID's in Intel Processors."  Finding the second document is easy by navigating to the page for the first document and then searching for the title of the second document.

 

0 Kudos
McCalpinJohn
Honored Contributor III
8,207 Views

Sorry about that -- I don't see any way to edit my previous post in this new infrastructure.....

As I note in the second report, every processor that I have tested that has the same number of CHAs and Cores enabled always always has the enabled/disabled cores and CHA/SF/LLC slices co-located.  So the locations of the disabled CHAs are the same as the locations of the disabled Cores.

This can't be the case for processors with a different number of enabled cores and CHAs, but the minimal testing I have done so far on the 24-Core/26-CHA Cascade Lake processors shows that the core is also disabled at the location of each of the two disabled CHAs, and then two additional cores are disabled.  The CHAs without co-located cores are easy to identify with my measurement methodology -- they are the two CHAs for which no core activates two inbound mesh links when reading from both memory controllers.  I don't have access to very many of these nodes, so I don't want to over-generalize....  (I don't think there are any processor models with more cores than CHAs enabled.)

Corrected link for the second report....

 

0 Kudos
McCalpinJohn
Honored Contributor III
8,230 Views

Stupid website won't let me fix the typo in my earlier note.....

Considering only SKX/CLX processor models that have the same number of enabled Cores and CHA/SF/LLC slices, all of my test results show that the disabled Cores are co-located with the disabled LLCs.   So if you know the location of the disabled LLCs, you also know the locations of the disabled cores.

Some processor models have more enabled CHAs than Cores.  I have tested some 24-Core/26-CHA Cascade Lake models and found that the cores at the disabled CHA locations are disabled.  Two additional cores are disabled.  These are relatively easy to find -- they are the CHAs for which no core generates active mesh data traffic links on two sides when reading from both IMCs.   I have not gone looking through PCI configuration space to see if there is a matching (undocumented) bit mask value.....

0 Kudos
winkzy
Beginner
7,789 Views

The reason for "most of the bus value was 0xffffffff" is that the user does not have permission to read msgs, in this case, the administrator authority should be used.

 

Because I run following commands on a server (2*Xeon_4210_10C, 12*8GB DDR4)

I use a normal user to test:

$ setpci -s 17:1e.3 0x9c.l
ffffffff

then use root:

# setpci -s 17:1e.3 0x9c.l
000003ff

0x3ff equals 10.

 

 

0 Kudos
aozcan
New Contributor I
7,228 Views

Hi @McCalpinJohn , are same commands true for Cascade Lake as well? 

 

What I tried was in sudo mode:


> lspci | grep :1e.3
16:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)

> setpci -s 16:1e.3 0x9c.l
0003ffff

 

However, this output does not seem correct, since Cascade Lake appears to have 28 tiles maximum, and on my system I have 18 cores. This implies that I should have 10 tiles disabled. However, result I got suggests that there are no disabled tiles (first 18 bits are 1 in the result)

I am not 100% sure about values 1e, 3 and 9c for Cascade Lake.

 

Where am I wrong here? What nuances are there between Skylake and Cascade Lake? Can I mostly use documentations I use for Skylake while dealing with Cascade Lake?

 

Thanks

0 Kudos
McCalpinJohn
Honored Contributor III
7,225 Views

Cascade Lake Xeon is identical to Skylake Xeon in almost all respects, including the CAPID6 register.

 

Intel's gen1/gen2 Xeon Scalable Processors are based on one of three different die sizes: 28-core (XCC), 18-core (HCC), and 10-core (LCC).  Block diagrams for each of these die are included in the Xeon Processor Scalable Memory Family Uncore Performance Monitoring Reference Manual (document 336274).

 

So how do you figure out which die a processor is made from?  Obviously, the active core count can rule out die that are too small, but if that is not unambiguous you need to go to the "2nd Gen Intel Xeon Scalable Processors Specification Update" (document 338848) and read the section titled "Identification Information".  Table 1 in that section shows the bit patterns from CAPID0 and CAPID4 that indicate the die version.  Be careful with the bit fields referred to in the tables, they are not all contiguous....

Most of the time it is better to look for the bus by VID:DID (VendorID:DeviceID).  The vendor ID for most of Intels products is 8086 (hex), and the CAPID0, CAPID4, CAPID6 registers are associated with device ID 2083 (hex).  You can find these devices using

$ lspci -d 8086:2083
17:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)
85:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)

These bus numbers match the values from my earlier posts.

Looking at the CAPID registers for this processor, I see:

$ setpci -s 17:1e.3 0x84.l    # CAPID0
001881ff
$ setpci -s 17:1e.3 0x94.l    # CAPID4
24000ec0
$ setpci -s 17:1e.3 0x9c.l    # CAPID6
0fffffff

Table 1 says we need bits 5,4,3,1,0 of CAPID1 and bits 7,6 of CAPID4.  These are all "1" here, matching the Table 1 entry for the "XCC" Physical Chop, the B-1 stepping, and the "Server, 8s" "Segment wayness" -- all of which match expectations for a 28-core Platinum series processor.

The CAPID6 value in this case contains 28 "1" values, indicating all LLC slices are enabled.

If I only look at CAPID6, your results are consistent with three interpretations:

  • an XCC (28-core) die with the 18 lowest-numbered LLC slices enabled, or
  • an HCC (18-core) die with all 18 LLC slices enabled, or
  • an XCC (28-core) die with a BIOS that hides which LLC slices are enabled/disabled by shifting all the "1" values to the low-order bits.  
    • The documentation of CAPID6 in the Uncore Performance Monitoring manual does not suggest that the *positions* of the "1" bits in CAPID6 are meaningful -- only that the *count* of "1" bits matches the number of enabled LLC slices.

With any luck, reviewing the CAPID0 and CAPID4 bits will resolve this ambiguity.

0 Kudos
aozcan
New Contributor I
7,205 Views

> lspci | grep :1e.3
16:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 07)

> setpci -s 16:1e.3 0x84.l
00248100
> setpci -s 16:1e.3 0x94.l
34680c8d

 

These values do not match with any row in Table 1 in the document you mentioned? How should I interpret this?

 

Apart from that, I had written "However, this output does not seem correct, since Cascade Lake appears to have 28 tiles maximum, and on my system I have 18 cores. This implies that I should have 10 tiles disabled. However, result I got suggests that there are no disabled tiles (first 28 bits are 1 in the result)" in my original post, but there is calculation mistake here: CAPID6 register has the value of  0003ffff, so first 18 bits are 1. I edited my original post.

 

I know that there are equal number of enabled tiles between left and right half of CPU dies on Intel as you had discovered in your research. So, I believe this eliminates the possibility of the server having 28 tiles and only the tiles that correspond the first 18 CHAs are enabled. I obviously cannot have a 10-tile die, so the only possibility seems to me I have 18 tile die with all of the cores enabled. Is this way of thinking correct?

0 Kudos
McCalpinJohn
Honored Contributor III
7,202 Views

(1) Try "lspci -d 8086:2083" to make sure you are looking at the correct bus:device:function

(2) Check the CPUID values that are also discussed in Table 1 of the specification update to make sure those values match what is expected.  

0 Kudos
Jaeyoung__Choi
Beginner
8,448 Views

Dear John

Thank you..!! I didn't know this simple method exist,

If I knew that, I wouldn't study what is kernel module and how it work...

I have learned a lot from you by referencing your comment for other question.

Thank you again!!

 

0 Kudos
Reply