topic Arik, in Analyzers

Understanding Advanced Hotspots CPI

Arik_R_Intel — Mon, 08 Aug 2016 06:32:46 GMT

Hi,

I'm starting to learn how to analyze OMP projects, and I've found that AH in Vtune gives me CPI, and I'm trying to better understand what is displayed.

For arguments sake, if I've set # threads to 4 on a 4 core machine and the CPI displayed is 1, is this 1 for each core or for the entire machine? As in if the machine advanced X cycles, then has each core done X instructions or X/4?

Thanks,

Arik

Hi,

Vladimir_T_Intel — Mon, 08 Aug 2016 15:17:45 GMT

Hi,

The CPI metric is calculated per item in the result grid. You can observe it for example, per function for each thread or CPU Core just by selecting appropriate grouping on the top of the grid in the Bottom-Up view. In case you have a Functin/Callstack grouping (default), then CPI is calculated per function on all CPUs.

Hello Arik,

Dmitry_P_Intel1 — Mon, 08 Aug 2016 16:58:28 GMT

Hello Arik,

On summary CPI is counted by the whole workload so all the cycles consumed by your application (on any core) divided by number of the application instructions (on any core). By CPI you will not be able to define how much instructions were executed per core. But if in your example we assume that cores did equal work with the same efficiency then you will have X/4 per core if cumulative number of clockticks is X and CPI=1 as far as I understand.

Thanks & Regards, Dmitry

ok, thank you both!

Arik_R_Intel — Mon, 08 Aug 2016 20:57:10 GMT

ok, thank you both!

Arik,

Dmitry_P_Intel1 — Tue, 09 Aug 2016 08:30:20 GMT

Arik,

By the way: if you use Intel OpenMP I would highly recommend to try HPC Performance Characterization analysis to look at OpenMP usage efficiency metrics like serial time vs parallel time, imbalance, different kind of overhead etc.

Thanks & Regards, Dmitry

@Dmitry

Arik_R_Intel — Tue, 09 Aug 2016 08:38:13 GMT

@Dmitry

HPC analysis is part of VTune or a separate tool?

Also I'm still trying to work out how to use the Intel omp.h and not the Microsoft omp.h, as the program is being compiled in VS2015

Arik,

Dmitry_P_Intel1 — Tue, 09 Aug 2016 11:15:55 GMT

Arik,

HPC Performance Characterization analysis is a part of VTune since VTune Amplifier XE 2016 Update 3 and also it is available in VTune Amplifier 2017 Beta and Beta Update 1.

In command line you will need to point something like this:

>amplxe-cl -collect hpc-performance -data-limit=0 -r <my_result_dir> <my_app>

In GUI the analysis is available in the analysis tree as "HPC Performance Characterization".

Thanks & Regards, Dmitry

I think I'm not using Update

Arik_R_Intel — Wed, 10 Aug 2016 04:50:00 GMT

I think I'm not using Update 3 because that doesn't exist for me. I am using the 2016 edition though. I'll try and update.

EDIT: I am using Update 3. My build: Update 3 (build 464096)

So any other reason that option doesn't exist?

Hello,

Dmitry_P_Intel1 — Wed, 10 Aug 2016 08:42:42 GMT

Hello,

The analysis should work with 2016 U3. Let us know if you encounter with any problems.

Thanks & Regards, Dmitry

Thanks for the speedy reply.

Arik_R_Intel — Wed, 10 Aug 2016 08:46:28 GMT

Thanks for the speedy reply.

Tried running in command line and got :

amplxe: Fatal error: Cannot find analysis type. Check input parameters or reinstall the product. Available analisis types:
        hotspots
        advanced-hotspots
        concurrency
        locksandwaits
        general-exploration
        bandwidth
        memory-access
        tsx-exploration
        tsx-hotspots
        sgx-hotspots
        cpugpu-concurrency
        system-overview
        gpu-hotspots
        disk-io

My cmd:

"C:\Program Files (x86)\IntelSWTools\VTune Amplifier 2016 for Systems\bin64\amplxe-cl" -collect hpc-performance -data-limit=0 -r c:\work\arinberg -- C:\work\arinberg\Kernels\patternMatching\PatternMatching\Debug\PatternMatching.exe C:\work\arinberg\Kernels\patternMatching\Group6 C:\work\arinberg\Kernels\patternMatching\wk1.tcpdump 8 128 1

Ok, I see the point of

Dmitry_P_Intel1 — Wed, 10 Aug 2016 09:39:02 GMT

Ok, I see the point of confusion. There are two VTunes - one "for Systems" and the second is "XE". The HPC Performance Characterization was enabled in VTune Amplifier XE. And 464096 is U3 for Systems if I'm not mistaken.

I know that HPC Performance Characterization will be added to VTune Amplifier for Systems in 2017 Gold only.

Anyway - you can find also the metrics on OpenMP efficiency (works for Intel OpenMP) in Advanced Hotspots as a special section on summary and on bottom up pane grid if you choose /OpenMP Regions/.. grouping

You can also read the following topic on this: https://software.intel.com/en-us/node/544172

Thanks & Regards, Dmitry

I got XE now. The HPC is now

Arik_R_Intel — Wed, 10 Aug 2016 13:21:33 GMT

I got XE now. The HPC is now available. Thanks!

Would there be a problem with OMP analysis if I'm compiling in VS 2015 using Microsoft compiler?

Microsoft compiler doesn't

TimP — Wed, 10 Aug 2016 13:53:18 GMT

Microsoft compiler doesn't insert specific identification of OpenMP regions as ICL does. You could run against the Intel libiomp5 in place of the Microsoft OpenMP library. You may need to do this in order to set affinity for repeatable results in VTune.

Hello Arik,

Islam_A_ — Wed, 10 Aug 2016 20:10:23 GMT

Hello Arik,

Thanks & Regards, Dmitry