- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm reading xeon-e5-v3-uncore-performance-monitoring.pdf. And I want to get RING_THRU_DN_BYTES and RING_THRU_UP_BYTES.
I have no idea about get bus, dev, fucn and its event code. And which device I should open.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When using the IMC counters, I can't get the "freeze" function to work via MSR 0x700.
I can get the counters to "freeze" by writing a "1" to bit 8 of MC_CHy_PCI_PMON_BOX_CTL, but I cannot get the counters to "unfreeze" by writing a "0" to that spot. I can only get them to "unfreeze" by rewriting the corresponding MC_CHy_PMON_CTL location.
My script to set up the IMC counters on Xeon E5 v3 works fine -- the IMC counts are all reasonable:
#!/bin/bash # IMC Performance Events # Most of our nodes have 2 channels on each of 2 IMCs # Buses [7f,ff], Devices [0x14,0x17], Functions [0,1] # Each of these has four programmable counters # Counter Offset Value Description # 0 0xD8 0x00400B01 ACT.(READ+WRITE+BYPASS) -- Umasks are new with Haswell # 1 0xDC 0x00400304 CAS_COUNT.READS # 2 0xE0 0x00400C04 CAS_COUNT.WRITES # 3 0xE4 0x00400102 PRE_COUNT.MISS -- page closes due to page conflicts echo "Setting up IMC Performance Counters" for BUS in 7f ff do for DEVICE in 14 17 do for FUNCTION in 0 1 do lspci -s ${BUS}:${DEVICE}.$FUNCTION setpci -s ${BUS}:${DEVICE}.${FUNCTION} 0xD8.L=0x00400B01 setpci -s ${BUS}:${DEVICE}.${FUNCTION} 0xDC.L=0x00400304 setpci -s ${BUS}:${DEVICE}.${FUNCTION} 0xE0.L=0x00400C04 setpci -s ${BUS}:${DEVICE}.${FUNCTION} 0xE4.L=0x00400102 done done done
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
These are listed as derived metrics for three different units in the processor uncore.
For the CBo units and the SBo units, the performance counters are accessed by MSRs, which you can read and write through the /dev/cpu/*/msr interface.
For the R2PCIe units, the performance counter interface is in Device 16 (decimal), Function 1. On a properly configured Linux system with a properly configured BIOS, the output of "lspci" will include lines like:
7f:10.1 Performance counters: Intel Corporation Xeon E5 v3/Core i7 PCIe Ring Interface (rev 02)
- The first two characters (before the ":") are the PCI bus number. The uncore devices on each socket are mapped to different PCI buses. My two-socket systems typically use "7f" and "ff", though some use "1f" and "7f". If the devices are there, they will be easy to find.
- The one or two characters between the ":" and the "." are the hexadecimal device number, so the "10" above is Device 16 (decimal).
- The character after the "." is the Function, so this is Function 1.
The Linux OS will create a device driver for this Bus:Device:Function at
/proc/bus/pci/7f/10.1
This device is used in the same was as the /dev/cpu/*/msr device drivers -- typically using pread() and pwrite() calls.
Because these interfaces are in PCI configuration space, they can also be accessed from the command line, using the "setpci" command to read or write byte/word/doubleword quantities, or using the "lspci" command with the "-xxx" option to dump the entire PCI configuration space for a particular device.
Everything here requires root access.
Writing the wrong things to the wrong places in PCI configuration space or in MSR space could hose your system.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your help.
In my system it is
7f:10.1 Performance counters: Intel Corporation Xeon E5 v3/Core i7 PCIe Ring Interface (rev 02) ff:12.1 Performance counters: Intel Corporation Xeon E5 v3/Core i7 Home Agent 0 (rev 02)
So I open it by the following code.
int bus[2]={0x7f,0xff}; for(i=0;i<2;i++) { sprintf(filename,"/proc/bus/pci/%x/10.1",bus); fd=open(filename,O_RDONLY); }
And I set the R2_PCI_PMON_BOX_CTL, R2_PCI_PMON_CTL2, and R2_PCI_PMON_CTL3.
for(i=0;i<2;i++) { value=1|1<<1ULL; pwrite(fd,(void*)&value,sizeof(uint32),0xF4); value=0x09|0xC<<8ULL|0x1<<22ULL; // RING_THRU_DN_BYTES CCW pwrite(fd,(void*)&value,sizeof(uint32),0xE0); value=0x09|0x3<<8ULL|0x1<<22ULL; // RING_THRU_UP_BYTES CW pwrite(fd,(void*)&value,sizeof(uint32),0xE4); }
Then I read R2_PCI_PMON_CTR2 and R2_PCI_PMON_CTR3.
for(i=0;i<2;i++) { pread(fd,(void*)&val1,sizeof(uint32),0xB4); pread(fd,(void*)&val2,sizeof(uint32),0xB0); pread(fd,(void*)&val3,sizeof(uint32),0xBC); pread(fd,(void*)&val4,sizeof(uint32),0xB8); printf("[%d] B4=%d B0=%d BC=%d B8=%d\n",i,val1,val2,val3,val4); }
I get the log as following.
[0] B4=0 B0=0 BC=0 B8=0 [1] B4=0 B0=0 BC=0 B8=0 [0] B4=0 B0=0 BC=0 B8=0 [1] B4=0 B0=0 BC=0 B8=0
I don't known how to test it, but I think my result is wrong. Maybe I set the wrong *CTL registers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This looks OK, but there are a couple of other items that need to be checked...
- Make sure that the U_MSR_PMON_GLOBAL_CTL register (MSR 0x0700) is not set to "freeze" all of the uncore counters (as described in section 2.1 of the Xeon E5 v3 uncore performance monitoring guide).
- It is always a good idea to check the PCI device ID in the code to make sure you are in the right place. The 16 bit field at offset 0 should be 0x8086 (indicating an Intel device), and the next 16-bit field should be 0x2f34 (the "DID" for this uncore device).
You can also use "lspci" after running your program as in independent check that you have written to the desired addresses.
To verify the company and device you can run this as a normal user, but to get beyond 64 Bytes you need to run as root. On my Xeon E5-2667 v3 I see:
$ /sbin/lspci -xxx -s ff:10.1
ff:10.1 Performance counters: Intel Corporation Xeon E5 v3/Core i7 PCIe Ring Interface (rev 02)
00: 86 80 34 2f 00 00 00 00 02 00 01 11 00 00 80 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 34 2f
30: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Note that "lspci" prints by byte with the lowest address to the left, so the first 16 bits is 0x8086 and the next 16 bits is 0x2f34, as desired. When I run the same command as root I get the rest of the first 256 Bytes of PCI configuration space, but nothing is in there since I have not attempted to program those counters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I read R2PCIe event, and open /proc/bus/pci/7f/10.1 and /proc/bus/pci/ff/10.1. But U_MSR_PMON_GLOBAL_CTL is in /dev/cpu/*/msr, it is more than 2 file on my machine. I don't undestand their relation. If I do, which msr can control the pci?
In the document
OR (if box level freeze control preferred)
a) Freeze the box’s counters while setting up the monitoring session.
e.g., set Cn_MSR_PMON_BOX_CTL.frz to 1
I set R2_PCI_PMON_BOX_CTL. I change the set.
for(i=0;i<2;i++) { value=0x1<<8ULL; pwrite(fd,(void*)&value,sizeof(uint32),0xF4); // R2_PCI_PMON_BOX_CTL value=0x1<<22ULL; pwrite(fd,(void*)&value,sizeof(uint32),0xE0); value=0x09|0xC|0x1<<22ULL; pwrite(fd,(void*)&value,sizeof(uint32),0xE0); // RING_THRU_DN_BYTES CCW value=0x1<<22ULL; pwrite(fd,(void*)&value,sizeof(uint32),0xE4); value=0x09|0x3|0x1<<22ULL; pwrite(fd,(void*)&value,sizeof(uint32),0xE4); // RING_THRU_UP_BYTES CW value=0x1|0x1<<1ULL|0x1<<8ULL; pwrite(fd,(void*)&value,sizeof(uint32),0xF4); value=0x1|0x1<<1ULL; pwrite(fd,(void*)&value,sizeof(uint32),0xF4); }
And also there no use for this. I also get the resualt.
[0] B4=0 B0=0 BC=0 B8=0
[1] B4=0 B0=0 BC=0 B8=0
[0] B4=0 B0=0 BC=0 B8=0
[1] B4=0 B0=0 BC=0 B8=0
[0] B4=0 B0=0 BC=0 B8=0
[1] B4=0 B0=0 BC=0 B8=0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I also confused that how could I known the CPU bus is 3f, 7f, bf or ff. There is no 3f, 7f, bf and ff on E5-4620.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
(1) The U_PMON_MSR_GLOBAL_CTL MSR is accessible from every core, but it has "package scope", so all cores in a package are accessing the same single register in the UBox of the uncore. As described in Section 2.2 of the Xeon E5 v3 Uncore Performance Monitoring Guide, the UBox serves as the "system configuration controller" for the processor and is the "master for reading and writing the physically distributed registers across [the processor]". This means that the Ubox manages access to both the MSRs (at least the ones outside the local core) and PCI configuration space, so it is no surprise that the global control register for the Uncore performance counters is located here.
(2) As described in Chapter 1 of Volume 2 of the processor datasheet, the bus number used for the processor uncore device configuration space can be located using the CPUBUSNO register. The CPUBUSNO register (described in section 6.6.33) is located on bus 0, device 5, function 0, offset 0x108, bits 15:8. On one of my Xeon E5 v3 boxes I see the value of 0x7f in this bit field:
$ setpci -s 0:5.0 0x108.l
00017f00
All of my 2-socket systems use either [3f,7f] or [7f,ff], while the 4-socket boxes all use [3f,7f,bf,ff]. Although the procedure it seems overly complex, Section 1.6.1 of the Xeon E5 v3 Uncore Performance Monitoring Guide provides explicit documentation for finding the bus numbers.
It is important to realize that the BIOS must understand and properly configure these buses or the system would not work. If the OS cannot see them, this appears to be because the BIOS refuses to grant the OS permission to control the buses. They are still present and functional, but if the BIOS refuses to allow control during the PCI discovery process, the OS will not enumerate the buses and build the internal databases used by lspci, and the OS will not build the corresponding pseudo-files under the /proc, /sys, and /dev directories.
You can still access these functions, but you have to do it using the memory-mapped interface. PCI configuration space is accessed as a contiguous 256 MiB region of memory-mapped IO space. You can usually find it immediately by looking for "PCI MMCONFIG" in the output of "cat /proc/iomem". Each of the possible buses, devices, and functions maps to a 4KiB block in this range. If I recall correctly, the offset into this range is computed by concatenating the bits:
Bus number: bits 27:0
Device number: bits 19:15
Function: bits 14:12
Offset: bits 11:0
Then this is added to the PCI MMCONFIG base address and used as the physical address using the /dev/mem interface. (Actually it makes more sense to "mmap()" a 256 MiB range starting at the PCI MMCONFIG base address and then using the computed address directly.)
Obviously you can get in a lot of trouble if you write the wrong things to the wrong addresses through the /dev/mem interface, so it definitely pays to do a lot of read-only testing first. All of our codes that use this interface check the first two 16-bit fields of each 4KiB region -- the first 16-bit field should be 0x8086 (indicating an Intel device), while the second 16-bit field should be the Device ID listed in Volume 2 of the processor datasheet or in the corresponding section of the Uncore Performance Monitoring Guide. If these don't match, you are either running on the wrong system or you have an addressing error.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I write U_MSR_PMON_GLOBAL_CTL failed.
Here is my code
uint32 value; sprintf(filename,"/dev/cpu/0/msr"); msrfds=open(filename,O_RDWR); value=0x1<<31ULL; printf("%x\n",value); rs=pwrite(msrfds,(void*)&value,sizeof(uint32),0x0700); // U_MSR_PMON_GLOBAL_CTL perror("msr pwrite"); printf("RETURN 0x0700 write: %ld %d\n",rs,msrfds);
Here is the output
80000000
msr pwrite: Invalid argument
RETURN 0x0700 write: -1 5
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I use the following code to detect the CPU UNCORE bus.
for(bus_no=0;bus_no<256;bus_no++) { device_no=5; function_no=108; sprintf(filename,"/proc/bus/pci/%02x/05.0",bus_no); fd=open(filename,O_RDWR); if(fd>0) { pread(fd,(void*)&value,sizeof(value),0x0); printf("BUS: %x DID: %lx\n",bus_no,value); if(0x2F288086==value) { pread(fd,(void*)&value,sizeof(value),0x108); printf("CPU UNCORE BUS: %x\n",(value&0x0FF00)>>8); } close(fd); } }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The last paragraph of section 2.10.2.2 of the Xeon E5 v3 Uncore Performance Monitoring Guide mentions that these registers must be written twice in a row in order to work. I don't what "twice in a row" means in this context -- when I use the "setpci" command to write these locations the values are updated the first time I write the data, but the performance counters don't start counting. Re-writing the counters from the shell using setpci does not change the behavior.
I don't have any trouble writing to MSR 0x700 using the "wrmsr" command-line tool, but I have not tested to see whether this actually does anything.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have try. It must write 64bits to 0x700. But document show that it is 32bits.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have write to 0x700, but it seems does not take effect.
value64=0x1<<31ULL&0x0FFFFFFFF; rs=pwrite(msrfd[0],(void*)&value64,sizeof(value64),0x0700); // U_MSR_PMON_GLOBAL_CTL rs=pwrite(msrfd[0],(void*)&value64,sizeof(value64),0x0700); // U_MSR_PMON_GLOBAL_CTL pread(msrfd[0],(void*)&value,sizeof(uint32),0x700); printf("0x700 frz_all: %lx\n",value);
And I also use wrmsr
[root@hsw-01 msr-tools-1.1.2]# ./wrmsr 0x700 0x80000000
[root@hsw-01 msr-tools-1.1.2]# ./rdmsr 0x700
0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Bit 31 of MSR 0x700 is clearly documented as a "Write Only" field (Table 2-2), so you should not expect to see it change.
The right way to test it is to see if it actually freezes counting of other uncore performance counters. You mentioned that you were able to program the IMC counters -- start them up and see if writing to bit 31 of MSR 0x700 freezes the counts, and if writing to bit 29 unfreezes the counters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When using the IMC counters, I can't get the "freeze" function to work via MSR 0x700.
I can get the counters to "freeze" by writing a "1" to bit 8 of MC_CHy_PCI_PMON_BOX_CTL, but I cannot get the counters to "unfreeze" by writing a "0" to that spot. I can only get them to "unfreeze" by rewriting the corresponding MC_CHy_PMON_CTL location.
My script to set up the IMC counters on Xeon E5 v3 works fine -- the IMC counts are all reasonable:
#!/bin/bash # IMC Performance Events # Most of our nodes have 2 channels on each of 2 IMCs # Buses [7f,ff], Devices [0x14,0x17], Functions [0,1] # Each of these has four programmable counters # Counter Offset Value Description # 0 0xD8 0x00400B01 ACT.(READ+WRITE+BYPASS) -- Umasks are new with Haswell # 1 0xDC 0x00400304 CAS_COUNT.READS # 2 0xE0 0x00400C04 CAS_COUNT.WRITES # 3 0xE4 0x00400102 PRE_COUNT.MISS -- page closes due to page conflicts echo "Setting up IMC Performance Counters" for BUS in 7f ff do for DEVICE in 14 17 do for FUNCTION in 0 1 do lspci -s ${BUS}:${DEVICE}.$FUNCTION setpci -s ${BUS}:${DEVICE}.${FUNCTION} 0xD8.L=0x00400B01 setpci -s ${BUS}:${DEVICE}.${FUNCTION} 0xDC.L=0x00400304 setpci -s ${BUS}:${DEVICE}.${FUNCTION} 0xE0.L=0x00400C04 setpci -s ${BUS}:${DEVICE}.${FUNCTION} 0xE4.L=0x00400102 done done done
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I guess I should note that this script only works on Xeon E5 v3 processors that have 2 "Home Agents", with one IMC per Home Agent and 2 DRAM channels per IMC. Some of the processors have only one Home Agent, with only one IMC and 3 or 4 DRAM channels attached to that IMC.
The number of Home Agents in each Xeon E5 v3 is listed in Table 1 of the Xeon E5 v3 Specification Update (document 330785).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The document shows that I must set a, b, c, d, e and f step to collect the event, here only set *CTL register. Is the document wrong?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems that "Address Map" is 2F28h on xeon-e5-v3. Others is not, or there no such "Address Map".I can't get e5(SNB)'s "Address Map". Are they different on this?
I can find xeon-e5-v3 and xeon-e5-v2 datasheet-vol2, but I can't get xeon-e5 datasheet-vol2.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Section 2.1.2 of the Xeon E5 v3 Uncore Performance Monitoring Guide (document 331051) does list steps a,b,c,d,e,f, but they are definitely not all needed.
- I never use the "freeze" function, so steps a & f are not needed.
- Steps b & c can be combined into a single write of the control register (as I do in my script).
- Step d is only necessary if you want the counters to start at zero. All I need to know is that the counter cannot wrap around more than once during the measurement interval. If this is true, then it is easy to correct for the case where the counter overflows exactly once.
- Step e is only necessary if you want to use the "interrupt on overflow" feature, which I do not use.
The "Address Map" PCI Configuration Space function has a different device ID for each processor generation, but it is Bus 0, Device 5, Function 0 in all three Xeon E5 generations. The Device IDs are:
- Xeon E5 2600 (gen1, Sandy Bridge): DID 0x3C28, datasheet volume 2 is document 326509
- Xeon E5 2600 (gen2, Ivy Bridge): DID 0x0E28, datasheet volume 2 is document 329188
- Xeon E5 2600 (gen 3, Haswell): DID 0x2F28, datasheet volume 2 is document 330784
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page