Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

[PCM] some KPIs not collected from specific CPUs

panko
Beginner
4,126 Views

Hi all,

We are using PCM to collect data and visualize that in Grafana. In general this work fine. The thing is that we have more that 10 machines and they have different CPUs.


We have:
- Intel Xeon CPU X5675
- Intel Xeon CPU E5-2667 v2
- Intel Xeon CPU E5-2640 v3
- Intel Xeon CPU E5-2698 v4

 

Only from Intel Xeon CPU E5-2667 v2 and Intel Xeon CPU E5-2640 v3 we are not able to collect some of the values. In particular Instructions Retired Any and Clock Unhalted Thread. What can be the issue?

 

In the attachments please find output of the 'pcm' command both for the affected machines and some other without issues. Affected are: fmighgm006, fmighgm007, fmighgm008. OK - fmighgm001 and fmighgm010. For fmighgm007 there are also some other commands executed, like: 'pcm-core' and 'pcm-memory'.

 

Indeed, in the log file I can see fmighgm007: Instructions retired: 0,
while on working machines something like this: Instructions retired: 26 G

The same for FREQ (which seems to be related to unhalted clock ticks): 0 for fmighgm007 and values like 0.39 on ok nodes.

 

If anything more is needed please let me know.

 

Thanks!

0 Kudos
8 Replies
Roman_D_Intel
Employee
4,118 Views

pcm-sensor-server can't be run in parallel with other pcm and non-pcm tools using the performance monitoring units (PMU). Make sure no other PMU tools are running in parallel.

0 Kudos
dpetrin
Beginner
3,836 Views

Hi, one of the devs that setup the environment for PCM here!
My colleagues and I made some analysis about this point and we didn't find any other process that uses PMU (aside from the PCM we setup); also, processes are pretty much aligned across every server that we have in production, so we don't think that's the issue.
As a further note, the problem is present only on servers with 'Xeon(R) CPU E5-2667 v2' and 'Xeon(R) CPU E5-2640 v3' CPUs.
Please let us know if there's any additional info you need, we are available for any further clarification about the environment.

0 Kudos
Roman_D_Intel
Employee
3,805 Views

are you running pcm-sensor-server inside a docker container?

0 Kudos
dpetrin
Beginner
3,769 Views

no, pcm is running natively on a RHEL7.9 OS; we installed it via a .rpm package

0 Kudos
Roman_D_Intel
Employee
3,766 Views

then what might help is to run with the PCM_NO_PERF=1 env variable:

 

https://github.com/intel/pcm/blob/master/doc/ENVVAR_README.md

0 Kudos
dpetrin
Beginner
3,127 Views

Hi, sorry for the late response, we normally don't have root access so we had request it to perform the suggestion you gave us.

Unfortunately, running pcm with the env variable PCM_NO_PERF=1 set did not solve the issue, in fact it was even worse as the daemon wouldn't even start.

In the attached text file you can find the whole logged pty, but I'll try to summarize:

  1. we initially tried to launch (as a command on the terminal) pcm without the PCM_NO_PERF env variable; this was just a check to be sure that everything was as before and values were missing (as expected)
  2. after that, we tried to launch pcm from the shell once again, but this time with the PCM_NO_PERF=1 set; this returned an error saying that the PMU was occupied by another application
  3. we stopped the pcm daemon we had running on the server (note that with this daemon running the first step did not return an error)
  4. repeating the 2nd step, the error returned is still the same
  5. at the end of the file you can find BIOS info about the server

We also tried to run the pcm daemon with the PCM_NO_PERF=1 set, but it returned the same error (PMU is occupied by other application) once again. Lastly, the error suggested to run pcm with the '-r' option: we did NOT run with this as this would reset PMU configuration and we didn't want to do that in a production server.

Is there anything else we can try?

0 Kudos
dpetrin
Beginner
2,286 Views

Hello,

is there some more info we can provide? Are there updates on this issue?

0 Kudos
Roman_D_Intel
Employee
2,257 Views

unfortunately there is no other way around. You need to use the -r option. BTW: Linux perf tool resets the PMU configuration all the time without asking when one starts profiling with Linux perf tool. PCM tries to follow the PMU sharing guidelines and asks the user: https://cdrdv2-public.intel.com/727001/pmu-sharing-guidelines.pdf 

0 Kudos
Reply