I am working with Intel Xeon E5-2650, using CentOS 6.5 using kernel 2.6.32-431.17.1
I would like to ask if I can access the uncore performance counters and if so, what exactly performance counters I need to read in order to get the read and the write traffic towards and from the memory.
I am using PAPI-5.3.2 which in its native events reports uncore events and I can set it up to read those. The numbers that I get seem reasonable. However, I 've read quite a lot of stuff around the web and it seems that there are the following issues:
To sum up, my question is whether I can read uncore events on Intel Sandy Bridge with CentOS 6.5 and 2.6 kernel using PAPI. And if so, are the events that I am using correct?
Thanks in advance,
The uncore events I recommend for memory bandwidth are mentioned in this other post. See https://software.intel.com/en-us/forums/topic/330063#comment-1717974
Does papi support UNC_ARB_TRK_REQUESTS.EVICTIONS and UNC_CBO_CACHE_LOOKUP.ANY_I ?
Thanks for you reply.
It reports something that looks like UNC_CBO_CACHE_LOOKUP.ANY_I. I gues that the "I" stands for Invalid cacheline?
The other one no. At least no with this name. I 'll try to look them up by event num and unit code.
Would you expect that PCM measures those in the platform characteristics that I mentioned before? If so, can I call PCM instrumentation from C code?
The URL I referenced has the event number, umask and uncore PMU unit name (such as CBO or ARB). Yes, the _I referes to invalid.
PCM on linux reports memory bw using MMIO based counters (not MSR based counters). The MMIO events such as UNC_IMC_DATA_READS and UNC_IMC_DATA_WRITES report mem read and write bw respectively.
Yes, you can call PCM as a library but I'm not familiar with how to do this.
For whole-program monitoring you should be able to use "perf" directly with that kernel. You will need to be running as root unless the kernel variable "perf_event_paranoid" is set to "0" (or a negative number). Since PAPI accesses the counters via "perf", you also need to be running as root to read the counters through PAPI. (Intel's performance monitoring tools use their own low-level interface that is not controlled by "perf_event_paranoid".)
Different versions of Linux seem to be aware of different subsets of names of uncore events, but perf allows specification of events by hex code, so understanding the names is not necessary. A sample script to read the uncore read and write counters is:
#!/bin/bash # define the Integrated Memory Controller performance counter event sets to measure the four events: # All Read CAS operations Event 0x04, Umask 0x03 # All Write CAS operations Event 0x04, Umask 0x0C # Each of the two "SET" variables below includes each of these events on all four memory controller channels. SET1='-e "uncore_imc_0/event=0x04,umask=0x03/" -e "uncore_imc_1/event=0x04,umask=0x03/" -e "uncore_imc_2/event=0x04,umask=0x03/" -e "uncore_imc_3/event=0x04,umask=0x03/"' SET2='-e "uncore_imc_0/event=0x04,umask=0x0c/" -e "uncore_imc_1/event=0x04,umask=0x0c/" -e "uncore_imc_2/event=0x04,umask=0x0c/" -e "uncore_imc_3/event=0x04,umask=0x0c/"' # Since the uncore counters are "per chip", I only need to read these on one core per chip. # For all of TACC's systems, I know that cores 0 and 9 are on different chips, whichever assignment scheme is used. # # "perf stat" flags: # "-a" counts for all processes (not just the process run under "perf stat") # "-A" tells perf to report results separately for each core, rather than summed # "-x ," tells perf to report results as a comma-separated list (easier to import into scripts or spreadsheets) # "-o file" directs the output of perf to a separate log file (rather than stdout) echo "running STREAM bound to socket 0" perf stat -o perf.out.imc.test1 -x , -a -A -C 0,9 $SET1 $SET2 numactl --membind=0 --physcpubind=0 ./stream.snb.10M.100x
Pat, for some reason, PAPI does not have support for ARB monitoring unit. I ll try with PCM. Thanks for your help.
John, thank you for your answer but I 'm interested in inserting instrumentation in my code in order to measure only parts of it. I have root access in the server that I am using. I am able to read such events with PAPI. The problem is that I am not able to identify the events that other people suggested as suitable in my setup (i.e. if you see my answer to Pat I 've not access to ARB PMU) and I wanted to know if this is a problem of my setup (kernel, PAPI, etc) and if I could use PCM (or any other tool) to count those events. It is necessary for me to be able to instrument parts of my code, not only the whole application.
Thanks a lot,
The original post asked about the Xeon E5-2650. This is a "Sandy Bridge EP" (DisplayFamily_DisplayModel 06_2DH -- i.e., cpuinfo model 45), while the ARB and CBO counters mentioned above only apply to the Sandy Bridge parts with the "client" uncore (DisplayFamily_DisplayModel 06_2AH -- i.e., cpuinfo model 42).
Uncore counters for the Sandy Bridge EP are documented in the "Intel Xeon Processor E5-2600 Product Family Uncore Performance Monitoring Guide" (document 327043).
PAPI 5.3.0 apparently has the ability to access the uncore counters, but I have not yet figured out how to build and test the sample programs in the PAPI src/components/perf_event_uncore directory.
The script that I included above targets Sandy Bridge EP processors. (It will probably also work with Ivy Bridge EP, but I have not tested it there.)