Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Intel PCM: which kernel modules are required?

Till_S_
Beginner
6,506 Views

I have installed PCM 2.8 on my Gentoo system (Kernel 4.1) and have trouble reading the performance registers. When running pcm I get the error (full output below):

Error while reading perf data. Result is 0
Check if you run other competing Linux perf clients.

Which kernel modules are needed for full pcm functionality? PERF_EVENTS_INTEL_UNCORE and PERF_EVENTS are already enabled. The full kernel config is attached to this post.

 

 

Error while reading perf data. Result is 0
Check if you run other competing Linux perf clients.
Error while reading perf data. Result is 0
Check if you run other competing Linux perf clients.
Error while reading perf data. Result is 0
Check if you run other competing Linux perf clients.
Error while reading perf data. Result is 0
Check if you run other competing Linux perf clients.
Error while reading perf data. Result is 0
Check if you run other competing Linux perf clients.
Error while reading perf data. Result is 0
Check if you run other competing Linux perf clients.
Error while reading perf data. Result is 0
Check if you run other competing Linux perf clients.
Error while reading perf data. Result is 0
Check if you run other competing Linux perf clients.
Error while reading perf data. Result is 0
Check if you run other competing Linux perf clients.

 EXEC  : instructions per nominal CPU cycle
 IPC   : instructions per CPU cycle
 FREQ  : relation to nominal CPU frequency='unhalted clock ticks'/'invariant timer ticks' (includes Intel Turbo Boost)
 AFREQ : relation to nominal CPU frequency while in active state (not in power-saving C state)='unhalted clock ticks'/'invariant timer ticks while in C0-state'  (includes Intel Turbo Boost)
 L3MISS: L3 cache misses
 L2MISS: L2 cache misses (including other core's L2 cache *hits*)
 L3HIT : L3 cache hit ratio (0.00-1.00)
 L2HIT : L2 cache hit ratio (0.00-1.00)
 L3CLK : ratio of CPU cycles lost due to L3 cache misses (0.00-1.00), in some cases could be >1.0 due to a higher memory latency
 L2CLK : ratio of CPU cycles lost due to missing L2 cache but still hitting L3 cache (0.00-1.00)
 READ  : bytes read from memory controller (in GBytes)
 WRITE : bytes written to memory controller (in GBytes)
 IO    : bytes read/written due to IO requests to memory controller (in GBytes); this may be an over estimate due to same-cache-line partial requests
 TEMP  : Temperature reading in 1 degree Celsius relative to the TjMax temperature (thermal headroom): 0 corresponds to the max temperature


 Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK |  READ | WRITE |  IO   | TEMP |

   0    0     0.00   -1.00   0.00    -1.00       0        0      1.00    1.00    -1.00    -1.00     N/A     N/A     N/A     56
   1    0     0.00   -1.00   0.00    -1.00       0        0      1.00    1.00    -1.00    -1.00     N/A     N/A     N/A     57
   2    0     0.00   -1.00   0.00    -1.00       0        0      1.00    1.00    -1.00    -1.00     N/A     N/A     N/A     56
   3    0     0.00   -1.00   0.00    -1.00       0        0      1.00    1.00    -1.00    -1.00     N/A     N/A     N/A     58
   4    0     0.00   -1.00   0.00    -1.00       0        0      1.00    1.00    -1.00    -1.00     N/A     N/A     N/A     56
   5    0     0.00   -1.00   0.00    -1.00       0        0      1.00    1.00    -1.00    -1.00     N/A     N/A     N/A     57
   6    0     0.00   -1.00   0.00    -1.00       0        0      1.00    1.00    -1.00    -1.00     N/A     N/A     N/A     56
   7    0     0.00   -1.00   0.00    -1.00       0        0      1.00    1.00    -1.00    -1.00     N/A     N/A     N/A     58
-----------------------------------------------------------------------------------------------------------------------------
 SKT    0     0.00   -1.00   0.00    -1.00       0        0      1.00    1.00    -1.00    -1.00    1.32    0.32    0.83     50
-----------------------------------------------------------------------------------------------------------------------------
 TOTAL  *     0.00   -1.00   0.00    -1.00       0        0      1.00    1.00    -1.00    -1.00    1.32    0.32    0.83     N/A

 Instructions retired:    0   ; Active cycles:    0   ; Time (TSC): 3394 Mticks ; C0 (active,non-halted) core residency: 0.00 %

 C1 core residency: 12.03 %; C3 core residency: 87.42 %; C6 core residency: 0.55 %; C7 core residency: 0.00 %;
 C2 package residency: 10.74 %; C3 package residency: 10.92 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %;

 PHYSICAL CORE IPC                 : -1.00 => corresponds to -25.00 % utilization for cores in active state
 Instructions per nominal CPU cycle: 0.00 => corresponds to 0.00 % core utilization over time interval
----------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------
 SKT    0 package consumed 15.38 Joules
----------------------------------------------------------------------------------------------
 TOTAL:                    15.38 Joules

0 Kudos
18 Replies
Roman_D_Intel
Employee
6,506 Views

Hi Till,

there is an issue with Linux perf API on your system. If you are sure that your don't use Linux perf in a different tool you can enable direct PMU programming in PCM. In PCM Makefile delete this line: CXXFLAGS += -DPCM_USE_PERF

Thanks,

Roman

0 Kudos
Till_S_
Beginner
6,506 Views

Hi Roman,

thank you for the work-around. I will try this the next days.

Anyway: I am still trying to figure out, what the problem with PERF is. Can you tell me if the kernel options/modules I listed above are sufficient or point me to a document with the necessary kernel dependencies? In this case, it would be possible for me to adjust the Gentoo package, to warn about missing options and therefore help other users of PCM.

Regards,

Till

0 Kudos
Roman_D_Intel
Employee
6,507 Views

Hi Till,

unfortunately I don't know what the issue with perf is.

Best regards,

Roman

0 Kudos
Till_S_
Beginner
6,507 Views

Hi,

here is the result of perf events. This might help to figure out which event is missing.

 

Regards,

Till

0 Kudos
Roman_D_Intel
Employee
6,507 Views

unfortunately don't see any hint in the list.  Could you try to check if perf works at all by collecting a few raw events:

perf stat -a -e r123  -e r124 -e r125 -e r126 -- sleep 1

0 Kudos
Till_S_
Beginner
6,507 Views

# perf stat -a -e r123  -e r124 -e r125 -e r126 -- sleep 1

 Performance counter stats for 'system wide':

                 0      r123                                                          (100.00%)
       168,618,459      r124                                                          (100.00%)
                 5      r125                                                          (100.00%)
       144,019,276      r126                                                        

       1.001587990 seconds time elapsed

0 Kudos
Roman_D_Intel
Employee
6,507 Views

the output looks good...

0 Kudos
Till_S_
Beginner
6,507 Views

just to sort things out:

  • building PCM without PCM_USE_PERF works.
  • when i run the perf version and afterwards the non-perf version of PCM a warning appears: "Core 0 IA32_PERFEVTSEL0_ADDR are not zeroed 1261870" (see full output below)

WARNING: Core 0 IA32_PERFEVTSEL0_ADDR are not zeroed 1261870
Access to Intel(r) Performance Counter Monitor has denied (Performance Monitoring Unit is occupied by other application). Try to stop the application that uses PMU.
Alternatively you can try running Intel PCM with option -r to reset PMU configuration at your own risk.

0 Kudos
Roman_D_Intel
Employee
6,507 Views

thanks for testing. The "IA32_PERFEVTSEL0" warning message is expected because perf API does not reset the state of PMU when perf interface is closed by the PCM app (known issue).

0 Kudos
Till_S_
Beginner
6,507 Views

I had a small look into the code and the place where it fails is cpucounter.cpp:2630 (I have downloaded version 2.10 in the meanwhile)

    uint64 data[1 + PERF_MAX_COUNTERS];
    const int32 bytes2read =  sizeof(uint64)*(1 + core_fixed_counter_num_used + core_gen_counter_num_used);
    int result = ::read(perfEventHandle[core][PERF_GROUP_LEADER_COUNTER], data, bytes2read );
    // data layout: nr counters; counter 0, counter 1, counter 2,...    
    if(result != bytes2read)
    {
       std::cerr << "Error while reading perf data. Result is "<< result << std::endl;
       std::cerr << "Check if you run other competing Linux perf clients." << std::endl;

 

where PERF_GROUP_LEADER_COUNTER is 0, therefore the handler is initialized in cpucounter.cpp:1698

        if((perfEventHandle[PERF_INST_RETIRED_ANY_POS] = syscall(SYS_perf_event_open, &e, -1,
                   i /* core id */, leader_counter /* group leader */ ,0 )) <= 0)
        {
          std::cerr <<"Linux Perf: Error on programming INST_RETIRED_ANY: "<<strerror(errno)<< std::endl;
          decrementInstanceSemaphore();
          return PCM::UnknownError;
        }

where PERF_INST_RETIRED_ANY_POS is also 0 (the enums are explicitly set to the same values)

Interestingly I the opening works without an error and it says "Successfully programmed on-core PMU using Linux perf" (cpucounter.cpp:1839), while the latter read than fails.

Therefore i have tried to read INST_RETIRED_ANY manually over perf  (using the EventCode and UMask from https://download.01.org/perfmon/IVB/IvyBridge_core_V15.tsv) and it always returns 0:

# perf stat -e cpu/event=0x00,umask=0x01,name=inst_retired_any/ -a sleep 5

 Performance counter stats for 'system wide':

                 0      inst_retired_any                                            

       5.000680645 seconds time elapsed

 

0 Kudos
Till_S_
Beginner
6,507 Views

hmm pmu-tools uses a different event id and the result is non-zero:

./ocperf.py stat -e inst_retired.any -a sleep 5
perf stat -e cpu/event=0xc0,umask=0x0,name=inst_retired_any/ -a sleep 5

 Performance counter stats for 'system wide':

     4,096,126,544      inst_retired_any                                            

       5.000614365 seconds time elapsed

 

0 Kudos
McCalpinJohn
Honored Contributor III
6,507 Views

The "INST_RETIRED.ANY" event at the beginning of the files at download.01.org has to be interpreted differently than the same event showing up later in the file.

The first three lines of that file all refer to "fixed counters", which are accessed slightly differently than the programmable counters.   If I understand correctly, for the three events supported by the three fixed function counters, the "perf stat" command will use the fixed counter if it is available, otherwise it will use one of the programmable counters.  

The "perf stat" command has historically not provided any interface to determine what counters are actually used or exactly how they are programmed.

0 Kudos
Andreas_K_Intel
Employee
6,507 Views

To find out what counter perf uses you can use the event-rmap tool in pmu-tools. It prints the currently running counters. Only supports core counters.

https://github.com/andikleen/pmu-tools/blob/master/event-rmap.py

 

0 Kudos
gostanian__richard
New Contributor I
6,507 Views

I am running Centos 7 with a 5.13 kernel. The normal Centos 7 kernel is 3.10.

I downloaded the 201902 release from https://github.com/opcm/pcm/releases

Almost all of the pcm tools work for me, but pcm.x produces junk data as well as the error:

Error while reading perf data. Result is 0
Check if you run other competing Linux perf clients. 

I tried building pcm.x after removing  CXXFLAGS += -DPCM_USE_PERF from the Makefile, but that made no difference.

I have no trouble using perf.  I'm about ready to give up on pcm.x, but before doing so I thought I'd ask for help here.

It's too bad it doesn't want to work, because it's a nice little tool. 

0 Kudos
Roman_D_Intel
Employee
6,507 Views

Hi Richard,

could you please also try the latest version from the master branch. The opcm releases were not up-to-date (1 year old). 

Best regards,

Roman

0 Kudos
Roman_D_Intel
Employee
6,507 Views

Hi Richard,

also if you are running more than one instance of pcm.x or other pcm utilities together with pcm.x in parallel, please avoid it. If you really need it this is possible but requires additional options.

Roman

0 Kudos
gostanian__richard
New Contributor I
6,507 Views

Hi Roman,

As per your suggestion, using the latest version, did the trick.

However the first time I ran pcm.x, after running make, I got garbage results.  I also got a message saying "run with -r at your own risk". I did that and everything then ran perfectly. I now always run with -r, even though I don't have to. Will that cause any harm?

Thanks,

Richard

 

0 Kudos
Thomas_W_Intel
Employee
6,507 Views

Perf, PCM, and other tools are using the same hardware devices, namely the performance monitoring units. If one tool programs the PMU while they are in use by another tool, this other tool will likely get garbage numbers. There are programmer guidelines to ensure that no two tools are using the PMUs at the same time. However, as you have observed, these guidelines are not always followed. In particular, it might happen that a tool does not clean up after it was using the PMU. This then prevents other tools that are following the guidelines from getting access to the PMU.

The -r option in PCM essentially tells PCM to ignore that some other tool might use the PMU already, and program the PMU nevertheless. If you are 100% sure that no other tool is using the PMU, then the -r option therefore won't do any harm. However, if something else is using the PMU, reprogramming the counters can have unexpected side effects.

0 Kudos
Reply