Solved: Understanding Advanced Hotspots CPI

Arik_R_Intel · ‎08-07-2016

Hi,

I'm starting to learn how to analyze OMP projects, and I've found that AH in Vtune gives me CPI, and I'm trying to better understand what is displayed.

For arguments sake, if I've set # threads to 4 on a 4 core machine and the CPI displayed is 1, is this 1 for each core or for the entire machine? As in if the machine advanced X cycles, then has each core done X instructions or X/4?

Thanks,

Arik

Vladimir_T_Intel · ‎08-08-2016

Hi,

The CPI metric is calculated per item in the result grid. You can observe it for example, per function for each thread or CPU Core just by selecting appropriate grouping on the top of the grid in the Bottom-Up view. In case you have a Functin/Callstack grouping (default), then CPI is calculated per function on all CPUs.

View solution in original post

Vladimir_T_Intel · ‎08-08-2016

Hi,

The CPI metric is calculated per item in the result grid. You can observe it for example, per function for each thread or CPU Core just by selecting appropriate grouping on the top of the grid in the Bottom-Up view. In case you have a Functin/Callstack grouping (default), then CPI is calculated per function on all CPUs.

Dmitry_P_Intel1 · ‎08-08-2016

Hello Arik,

On summary CPI is counted by the whole workload so all the cycles consumed by your application (on any core) divided by number of the application instructions (on any core). By CPI you will not be able to define how much instructions were executed per core. But if in your example we assume that cores did equal work with the same efficiency then you will have X/4 per core if cumulative number of clockticks is X and CPI=1 as far as I understand.

Thanks & Regards, Dmitry

Arik_R_Intel · ‎08-08-2016

ok, thank you both!

Dmitry_P_Intel1 · ‎08-09-2016

Arik,

By the way: if you use Intel OpenMP I would highly recommend to try HPC Performance Characterization analysis to look at OpenMP usage efficiency metrics like serial time vs parallel time, imbalance, different kind of overhead etc.

Thanks & Regards, Dmitry

Arik_R_Intel · ‎08-09-2016

@Dmitry

HPC analysis is part of VTune or a separate tool?

Also I'm still trying to work out how to use the Intel omp.h and not the Microsoft omp.h, as the program is being compiled in VS2015

Dmitry_P_Intel1 · ‎08-09-2016

Arik,

HPC Performance Characterization analysis is a part of VTune since VTune Amplifier XE 2016 Update 3 and also it is available in VTune Amplifier 2017 Beta and Beta Update 1.

In command line you will need to point something like this:

>amplxe-cl -collect hpc-performance -data-limit=0 -r <my_result_dir> <my_app>

In GUI the analysis is available in the analysis tree as "HPC Performance Characterization".

Thanks & Regards, Dmitry

Arik_R_Intel · ‎08-09-2016

I think I'm not using Update 3 because that doesn't exist for me. I am using the 2016 edition though. I'll try and update.

EDIT: I am using Update 3. My build: Update 3 (build 464096)

So any other reason that option doesn't exist?

Dmitry_P_Intel1 · ‎08-10-2016

Hello,

The analysis should work with 2016 U3. Let us know if you encounter with any problems.

Thanks & Regards, Dmitry

Arik_R_Intel · ‎08-10-2016

Thanks for the speedy reply.

Tried running in command line and got :

amplxe: Fatal error: Cannot find analysis type. Check input parameters or reinstall the product. Available analisis types:
        hotspots
        advanced-hotspots
        concurrency
        locksandwaits
        general-exploration
        bandwidth
        memory-access
        tsx-exploration
        tsx-hotspots
        sgx-hotspots
        cpugpu-concurrency
        system-overview
        gpu-hotspots
        disk-io

My cmd:

"C:\Program Files (x86)\IntelSWTools\VTune Amplifier 2016 for Systems\bin64\amplxe-cl" -collect hpc-performance -data-limit=0 -r c:\work\arinberg -- C:\work\arinberg\Kernels\patternMatching\PatternMatching\Debug\PatternMatching.exe C:\work\arinberg\Kernels\patternMatching\Group6 C:\work\arinberg\Kernels\patternMatching\wk1.tcpdump 8 128 1

Dmitry_P_Intel1 · ‎08-10-2016

Ok, I see the point of confusion. There are two VTunes - one "for Systems" and the second is "XE". The HPC Performance Characterization was enabled in VTune Amplifier XE. And 464096 is U3 for Systems if I'm not mistaken.

I know that HPC Performance Characterization will be added to VTune Amplifier for Systems in 2017 Gold only.

Anyway - you can find also the metrics on OpenMP efficiency (works for Intel OpenMP) in Advanced Hotspots as a special section on summary and on bottom up pane grid if you choose /OpenMP Regions/.. grouping

You can also read the following topic on this: https://software.intel.com/en-us/node/544172

Thanks & Regards, Dmitry

Arik_R_Intel · ‎08-10-2016

I got XE now. The HPC is now available. Thanks!

Would there be a problem with OMP analysis if I'm compiling in VS 2015 using Microsoft compiler?

TimP · ‎08-10-2016

Microsoft compiler doesn't insert specific identification of OpenMP regions as ICL does. You could run against the Intel libiomp5 in place of the Microsoft OpenMP library. You may need to do this in order to set affinity for repeatable results in VTune.

Islam_A_ · ‎08-10-2016

Hello Arik,

On summary CPI is counted by the whole workload so all the cycles consumed by your application (on any core) divided by number of the application instructions (on any core). By CPI you will not be able to define how much instructions were executed per core. But if in your example we assume that cores did equal work with the same efficiency then you will have X/4 per core if cumulative number of clockticks is X and CPI=1 as far as I understand.

Thanks & Regards, Dmitry