Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

clarification regarding PCM output.

ran_t_
Beginner
371 Views

hello.

In essence, this is an informational issue regarding the output of the PCM utility.

Once an output is at hand, we need to have a clear view of the CPU consumption in a more traditional manner :)  :  CPU usage percentage .

Below, is a sample of the PCM output.

Is there a way to conclude the CPU percentage used  ?

Is it reported somehow in the output of the PCM ? if so, im unclear about the meaning of “residency” or how to signify the CPU consumption percentage , can you clarify  ? (specifically relating to the lower part of the report)

 

Core (SKT) | EXEC | IPC  | FREQ  | AFREQ | L3MISS | L2MISS | L3HIT | L2HIT | L3CLK | L2CLK  | READ  | WRITE | TEMP

   0    0     1.04   1.45   0.72    1.14      66       17 K    1.00    0.78    0.00    0.00     N/A     N/A     50

   1    0     0.00   0.80   0.00    1.11      18       36 K    1.00    0.40    0.00    0.22     N/A     N/A     53

   2    0     0.00   0.55   0.00    1.01       9      654      0.99    0.50    0.01    0.12     N/A     N/A     53

   3    0     0.00   0.56   0.00    1.09      99     1663      0.94    0.41    0.03    0.10     N/A     N/A     52

   4    0     0.00   0.40   0.00    1.12       0     2812      1.00    0.41    0.00    0.05     N/A     N/A     53

   5    0     0.00   0.53   0.00    1.12       5       16 K    1.00    0.38    0.00    0.11     N/A     N/A     51

   6    0     0.59   1.43   0.41    1.14       3       18 K    1.00    0.82    0.00    0.00     N/A     N/A     50

   7    0     0.00   0.52   0.00    1.05      17     1207      0.99    0.35    0.01    0.14     N/A     N/A     53

   8    0     0.00   0.55   0.00    1.07      39     1113      0.96    0.37    0.02    0.14     N/A     N/A     53

   9    0     0.00   0.49   0.00    1.11      76     2021      0.96    0.36    0.02    0.13     N/A     N/A     52

   a    0     0.00   0.23   0.00    1.12       5     1166      1.00    0.39    0.00    0.05     N/A     N/A     53

   b    0     0.00   0.55   0.00    1.10      65     1811      0.96    0.39    0.02    0.14     N/A     N/A     51

-------------------------------------------------------------------------------------------------------------------

SKT    0     0.14   1.44   0.09    1.14     401      101 K    1.00    0.65    0.00    0.00    0.02    0.02     50

-------------------------------------------------------------------------------------------------------------------

TOTAL  *     0.14   1.44   0.09    1.14     408      101 K    1.00    0.65    0.00    0.00    0.02    0.02     N/A

 

Instructions retired: 9801 M ; Active cycles: 6819 M ; Time (TSC): 6001 Mticks ; C0 (active,non-halted) core residency: 8.32 %

 

C1 core residency: 91.68 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 0.00 %

C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %

 

PHYSICAL CORE IPC                 : 2.87 => corresponds to 71.86 % utilization for cores in active state

Instructions per nominal CPU cycle: 0.27 => corresponds to 6.81 % core utilization over time interval

----------------------------------------------------------------------------------------------

 

----------------------------------------------------------------------------------------------

SKT    0 package consumed 89.13 Joules

----------------------------------------------------------------------------------------------

TOTAL:                    89.13 Joules

 

----------------------------------------------------------------------------------------------

SKT    0 DIMMs consumed 9.43 Joules

----------------------------------------------------------------------------------------------

 

 

thanks

 

0 Kudos
5 Replies
Bernard
Valued Contributor I
371 Views

>>>PHYSICAL CORE IPC                 : 2.87 => corresponds to 71.86 % utilization for cores in active state>>>

Here you have average instructions per cycle count.What did you measure with PCM?

0 Kudos
Bernard
Valued Contributor I
371 Views

This ratio in general measure instruction level parallelism and can be affected by data dependencies and long latency instruction execution.

0 Kudos
Patrick_F_Intel1
Employee
371 Views

Hello Ran,

You can use the part of the output:

C0 (active,non-halted) core residency: 8.32 %

So the system was not halted only 8.32%... The %idle was greater than or equal to 91.68%.

PCM doesn't know what processes are running. PCM only displays hardware counters.

Pat

0 Kudos
Bernard
Valued Contributor I
371 Views

For per thread/function breakdown use VTune 

0 Kudos
ran_t_
Beginner
371 Views

hello again,

kind thanks for your answers.

in order to make this discussion more accurate, i wish to do the following:

a) explain shortly about our application

b) provide a set of questions regarding an output attached to this thread. the output is of a real test being made on our system.

the below is going to be a bit lengthy , and i apologize in advance.

overview

---------------------------------------------------

Radvision, is engaged in the development of conferencing systems . the main attribute these systems have , is the ability to transfer video in modern day protocols (H.264..) between end stations . contrary to our home "conferencing" platforms such as skype or messenger , these systems can have multiple participants presenting data at the time of the conference with different layouts and features, where the origin and the destinations, can be different video end points with different capabilities .

in order to facilitate the transfer and processing of the video , the system must have proper horse power to perform the needed alterations /conversions/scaling/blending etc .. it is required to do.  to date, the current Intel processors fell short from available DSP parts that can perform this task ,hence, while the host part of the application will typically run from an Intel processor domain, the majority of the video processing will be conducted off a daughter card , bearing a  DSP farm for the implementation of the required operations.

so the basic flow of data would require compressed video coming from the network ports into (and out of) the system , the data would be uncompressed in the Intel domain and transferred over PCIe to a set of DSPs connected to the Intel RC via a PCIe switch. the switch and the DSPs are a part of the daughter card.

the application is running above the Linux OS ,and is composed of several processes which perform the control and oversight of the video conferences in the system ,the actual administration of the DSP farm/s and the transfer of data to/from them, and the management of the video operations required from each conference... so the expected CPU consumption on the system can be rather high , especially when a lot of data is being processed.

in order to apply order to the way the application is behaving over the CPU, the general cores of the system are affinitized in accordance to the SW needs. the affinitization is done at the start of the application's life, and sticks throughout the life of the system.

questions and log

------------------------------------------------------------------------------------------------

in the attached file to this thread , you can review the results of two pcm utilities: memory, and CPU usage.

i was pleased to see the ease of use of these utilities contrary to the vTune ... i think that at first stage ,  understanding the output, can shed a lot of light on the behavior of the system.

i should point out, our systems are using SB processors from the E5-2600 family, and utilizing 1333MHZ RAM sticks.

questions:

========

A. what can be said on the memory output of their system in view of these logs ? is there anything you would consider out of the ordinary?

B. regarding the CPU consumption of the system , can you better explain the difference between IPC and EXEC ? if EXEC is a normalized  value, what is the normalization factor ?

 C. FREQ/AFREQ  -- what do you mean by : unhalted clocks and invariant clocks ?   do these figures show the difference in relation of a system that has halt states and one that does not ? and what does it mean ?

D. on the second part of the CPU report, referring to the following example:

Instructions retired: 76 G ; Active cycles: 40 G ; Time (TSC): 10 Gticks ; C0 (active,non-halted) core residency: 29.01 % C1 core residency: 70.99 %; C3 core residency: 0.00 %; C6 core residency: 0.00 %; C7 core residency: 0.00 % C2 package residency: 0.00 %; C3 package residency: 0.00 %; C6 package residency: 0.00 %; C7 package residency: 0.00 %

can i assume from this log that 29% of the time the system was at C0 state while 70.99% of the time at C1 , which means that 29% of the time the CPU was running (no accurate core) ? is the terminology im using correct ? if not can you correct me ?

E. what is the difference between core residency and package residency ?

F. can you explain the following line ?

PHYSICAL CORE IPC : 4.63 => corresponds to 115.87 % utilization for cores in active state Instructions per nominal CPU cycle: 1.26 => corresponds to 31.53 % core utilization over time interval ----------------------------------------------------------------------------------------------

what does it mean , and how can we have more than 100% utilization ?

G. in general, is there something that can be concluded from the CPU report ,regarding the cache use ? how good is it ? why are you using different representations for the misses and the hits ? and how can i make conclusions regarding these figures ?

so, i know this is lengthy , and i have a lot of questions ... but i think pcm can be really god for our systems and i would really like to understand it better.

i will appreciate your answers on the matter ,as i truly find some of the figures and descriptions on the log confusing.

thanks

0 Kudos
Reply