Analyzers
Community support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)
4963 Discussions

Understanding Advanced Hotspots CPI

Arik_R_Intel
Employee
663 Views

Hi,

I'm starting to learn how to analyze OMP projects, and I've found that AH in Vtune gives me CPI, and I'm trying to better understand what is displayed.

For arguments sake, if I've set # threads to 4 on a 4 core machine and the CPI displayed is 1, is this 1 for each core or for the entire machine? As in if the machine advanced X cycles, then has each core done X instructions or X/4?

 

Thanks,

Arik

0 Kudos
1 Solution
Vladimir_T_Intel
Moderator
663 Views

Hi,

The CPI metric is calculated per item in the result grid. You can observe it for example, per function for each thread or CPU Core just by selecting appropriate grouping on the top of the grid in the Bottom-Up view. In case you have a Functin/Callstack grouping (default), then CPI is calculated per function on all CPUs.

View solution in original post

0 Kudos
13 Replies
Vladimir_T_Intel
Moderator
664 Views

Hi,

The CPI metric is calculated per item in the result grid. You can observe it for example, per function for each thread or CPU Core just by selecting appropriate grouping on the top of the grid in the Bottom-Up view. In case you have a Functin/Callstack grouping (default), then CPI is calculated per function on all CPUs.

0 Kudos
Dmitry_P_Intel1
Employee
663 Views

Hello Arik,

On summary CPI is counted by the whole workload so all the cycles consumed by your application (on any core) divided by number of the application instructions (on any core). By CPI you will not be able to define how much instructions were executed per core. But if in your example we assume that cores did equal work with the same efficiency then you will have X/4 per core if cumulative number of clockticks is X and CPI=1 as far as I understand.

Thanks & Regards, Dmitry

0 Kudos
Arik_R_Intel
Employee
663 Views

ok, thank you both!

0 Kudos
Dmitry_P_Intel1
Employee
663 Views

Arik,

By the way: if you use Intel OpenMP I would highly recommend to try HPC Performance Characterization analysis to look at OpenMP usage efficiency metrics like serial time vs parallel time, imbalance, different kind of overhead etc.

Thanks & Regards, Dmitry

0 Kudos
Arik_R_Intel
Employee
663 Views

@Dmitry

HPC analysis is part of VTune or a separate tool?

Also I'm still trying to work out how to use the Intel omp.h and not the Microsoft omp.h, as the program is being compiled in VS2015

0 Kudos
Dmitry_P_Intel1
Employee
663 Views

Arik,

HPC Performance Characterization analysis is a part of VTune since VTune Amplifier XE 2016 Update 3 and also it is available in VTune Amplifier 2017 Beta and Beta Update 1.

In command line you will need to point something like this:

>amplxe-cl -collect hpc-performance -data-limit=0 -r <my_result_dir> <my_app>

In GUI the analysis is available in the analysis tree as "HPC Performance Characterization".

Thanks & Regards, Dmitry

0 Kudos
Arik_R_Intel
Employee
663 Views

I think I'm not using Update 3 because that doesn't exist for me. I am using the 2016 edition though. I'll try and update.

EDIT: I am using Update 3. My build:  Update 3 (build 464096)

So any other reason that option doesn't exist?

0 Kudos
Dmitry_P_Intel1
Employee
664 Views

Hello,

The analysis should work with 2016 U3. Let us know if you encounter with any problems.

Thanks & Regards, Dmitry

0 Kudos
Arik_R_Intel
Employee
664 Views

Thanks for the speedy reply.

Tried running in command line and got :

amplxe: Fatal error: Cannot find analysis type. Check input parameters or reinstall the product. Available analisis types:
        hotspots
        advanced-hotspots
        concurrency
        locksandwaits
        general-exploration
        bandwidth
        memory-access
        tsx-exploration
        tsx-hotspots
        sgx-hotspots
        cpugpu-concurrency
        system-overview
        gpu-hotspots
        disk-io

 

My cmd:

"C:\Program Files (x86)\IntelSWTools\VTune Amplifier 2016 for Systems\bin64\amplxe-cl" -collect hpc-performance -data-limit=0 -r c:\work\arinberg -- C:\work\arinberg\Kernels\patternMatching\PatternMatching\Debug\PatternMatching.exe C:\work\arinberg\Kernels\patternMatching\Group6 C:\work\arinberg\Kernels\patternMatching\wk1.tcpdump 8 128 1

0 Kudos
Dmitry_P_Intel1
Employee
664 Views

Ok, I see the point of confusion. There are two VTunes - one "for Systems" and the second is "XE". The HPC Performance Characterization was enabled in VTune Amplifier XE. And 464096 is U3 for Systems if I'm not mistaken.

I know that HPC Performance Characterization will be added to VTune Amplifier for Systems in 2017 Gold only.

Anyway - you can find also the metrics on OpenMP efficiency (works for Intel OpenMP) in Advanced Hotspots as a special section on summary and on  bottom up pane grid if you choose /OpenMP Regions/.. grouping

You can also read the following topic on this: https://software.intel.com/en-us/node/544172

Thanks & Regards, Dmitry

0 Kudos
Arik_R_Intel
Employee
664 Views

I got XE now. The HPC is now available. Thanks!

Would there be a problem with OMP analysis if I'm compiling in VS 2015 using Microsoft compiler?

0 Kudos
TimP
Black Belt
664 Views

Microsoft compiler doesn't insert specific identification of OpenMP regions as ICL does.  You could run against the Intel libiomp5 in place of the Microsoft OpenMP library.  You may need to do this in order to set affinity for repeatable results in VTune.

0 Kudos
Islam_A_
Beginner
664 Views

Hello Arik,

On summary CPI is counted by the whole workload so all the cycles consumed by your application (on any core) divided by number of the application instructions (on any core). By CPI you will not be able to define how much instructions were executed per core. But if in your example we assume that cores did equal work with the same efficiency then you will have X/4 per core if cumulative number of clockticks is X and CPI=1 as far as I understand.

Thanks & Regards, Dmitry

0 Kudos
Reply