<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Arik, in Analyzers</title>
    <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088927#M15514</link>
    <description>&lt;P&gt;Arik,&lt;/P&gt;

&lt;P&gt;By the way: if you use Intel OpenMP I would highly recommend to try HPC Performance Characterization analysis to look at OpenMP usage efficiency metrics like serial time vs parallel time, imbalance, different kind of overhead etc.&lt;/P&gt;

&lt;P&gt;Thanks &amp;amp; Regards, Dmitry&lt;/P&gt;</description>
    <pubDate>Tue, 09 Aug 2016 08:30:20 GMT</pubDate>
    <dc:creator>Dmitry_P_Intel1</dc:creator>
    <dc:date>2016-08-09T08:30:20Z</dc:date>
    <item>
      <title>Understanding Advanced Hotspots CPI</title>
      <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088923#M15510</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;I'm starting to learn how to analyze OMP projects, and I've found that AH in Vtune gives me CPI, and I'm trying to better understand what is displayed.&lt;/P&gt;

&lt;P&gt;For arguments sake, if I've set # threads to 4 on a 4 core machine and the CPI displayed is 1, is this 1 for each core or for the entire machine? As in if the machine advanced X cycles, then has each core done X instructions or X/4?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Arik&lt;/P&gt;</description>
      <pubDate>Mon, 08 Aug 2016 06:32:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088923#M15510</guid>
      <dc:creator>Arik_R_Intel</dc:creator>
      <dc:date>2016-08-08T06:32:46Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088924#M15511</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;The CPI metric is calculated per item in the result grid. You can observe it for example, per function for each thread or CPU Core just by selecting appropriate grouping on the top of the grid in the Bottom-Up view. In case you have a Functin/Callstack grouping (default), then CPI is calculated per function on all CPUs.&lt;/P&gt;</description>
      <pubDate>Mon, 08 Aug 2016 15:17:45 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088924#M15511</guid>
      <dc:creator>Vladimir_T_Intel</dc:creator>
      <dc:date>2016-08-08T15:17:45Z</dc:date>
    </item>
    <item>
      <title>Hello Arik,</title>
      <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088925#M15512</link>
      <description>&lt;P&gt;Hello Arik,&lt;/P&gt;

&lt;P&gt;On summary CPI is counted by the whole workload so all the cycles consumed by your application (on any core) divided by number of the application instructions (on any core). By CPI you will not be able to define how much instructions were executed per core. But if in your example we assume that cores did equal work with the same efficiency then you will have X/4&amp;nbsp;per core if cumulative number of clockticks&amp;nbsp;is X and CPI=1 as far as I understand.&lt;/P&gt;

&lt;P&gt;Thanks &amp;amp; Regards, Dmitry&lt;/P&gt;</description>
      <pubDate>Mon, 08 Aug 2016 16:58:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088925#M15512</guid>
      <dc:creator>Dmitry_P_Intel1</dc:creator>
      <dc:date>2016-08-08T16:58:28Z</dc:date>
    </item>
    <item>
      <title>ok, thank you both!</title>
      <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088926#M15513</link>
      <description>&lt;P&gt;ok, thank you both!&lt;/P&gt;</description>
      <pubDate>Mon, 08 Aug 2016 20:57:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088926#M15513</guid>
      <dc:creator>Arik_R_Intel</dc:creator>
      <dc:date>2016-08-08T20:57:10Z</dc:date>
    </item>
    <item>
      <title>Arik,</title>
      <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088927#M15514</link>
      <description>&lt;P&gt;Arik,&lt;/P&gt;

&lt;P&gt;By the way: if you use Intel OpenMP I would highly recommend to try HPC Performance Characterization analysis to look at OpenMP usage efficiency metrics like serial time vs parallel time, imbalance, different kind of overhead etc.&lt;/P&gt;

&lt;P&gt;Thanks &amp;amp; Regards, Dmitry&lt;/P&gt;</description>
      <pubDate>Tue, 09 Aug 2016 08:30:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088927#M15514</guid>
      <dc:creator>Dmitry_P_Intel1</dc:creator>
      <dc:date>2016-08-09T08:30:20Z</dc:date>
    </item>
    <item>
      <title>@Dmitry</title>
      <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088928#M15515</link>
      <description>&lt;P&gt;@Dmitry&lt;/P&gt;

&lt;P&gt;HPC analysis is part of VTune or a separate tool?&lt;/P&gt;

&lt;P&gt;Also I'm still trying to work out how to use the Intel omp.h and not the Microsoft omp.h, as the program is being compiled in VS2015&lt;/P&gt;</description>
      <pubDate>Tue, 09 Aug 2016 08:38:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088928#M15515</guid>
      <dc:creator>Arik_R_Intel</dc:creator>
      <dc:date>2016-08-09T08:38:13Z</dc:date>
    </item>
    <item>
      <title>Arik,</title>
      <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088929#M15516</link>
      <description>&lt;P&gt;Arik,&lt;/P&gt;

&lt;P&gt;HPC Performance Characterization analysis is a part of VTune since VTune Amplifier XE 2016 Update 3 and also it is available in VTune Amplifier 2017 Beta and Beta Update 1.&lt;/P&gt;

&lt;P&gt;In command line you will need to point something like this:&lt;/P&gt;

&lt;P&gt;&amp;gt;amplxe-cl -collect hpc-performance -data-limit=0 -r &amp;lt;my_result_dir&amp;gt; &amp;lt;my_app&amp;gt;&lt;/P&gt;

&lt;P&gt;In GUI the analysis is available in the analysis tree as "HPC Performance Characterization".&lt;/P&gt;

&lt;P&gt;Thanks &amp;amp; Regards, Dmitry&lt;/P&gt;</description>
      <pubDate>Tue, 09 Aug 2016 11:15:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088929#M15516</guid>
      <dc:creator>Dmitry_P_Intel1</dc:creator>
      <dc:date>2016-08-09T11:15:55Z</dc:date>
    </item>
    <item>
      <title>I think I'm not using Update</title>
      <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088930#M15517</link>
      <description>&lt;P&gt;I think I'm not using Update 3 because that doesn't exist for me. I am using the 2016 edition though. I'll try and update.&lt;/P&gt;

&lt;P&gt;EDIT: I am using Update 3. My build: &amp;nbsp;Update 3 (build 464096)&lt;/P&gt;

&lt;P&gt;So any other reason that option doesn't exist?&lt;/P&gt;</description>
      <pubDate>Wed, 10 Aug 2016 04:50:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088930#M15517</guid>
      <dc:creator>Arik_R_Intel</dc:creator>
      <dc:date>2016-08-10T04:50:00Z</dc:date>
    </item>
    <item>
      <title>Hello,</title>
      <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088931#M15518</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;

&lt;P&gt;The analysis should work with 2016 U3. Let us know if you&amp;nbsp;encounter with&amp;nbsp;any problems.&lt;/P&gt;

&lt;P&gt;Thanks &amp;amp; Regards, Dmitry&lt;/P&gt;</description>
      <pubDate>Wed, 10 Aug 2016 08:42:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088931#M15518</guid>
      <dc:creator>Dmitry_P_Intel1</dc:creator>
      <dc:date>2016-08-10T08:42:42Z</dc:date>
    </item>
    <item>
      <title>Thanks for the speedy reply.</title>
      <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088932#M15519</link>
      <description>&lt;P&gt;Thanks for the speedy reply.&lt;/P&gt;

&lt;P&gt;Tried running in command line and got :&lt;/P&gt;

&lt;P&gt;amplxe: Fatal error: Cannot find analysis type. Check input parameters or reinstall the product. Available analisis types:&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hotspots&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; advanced-hotspots&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; concurrency&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; locksandwaits&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; general-exploration&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; bandwidth&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; memory-access&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; tsx-exploration&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; tsx-hotspots&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sgx-hotspots&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cpugpu-concurrency&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; system-overview&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; gpu-hotspots&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; disk-io&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;My cmd:&lt;/P&gt;

&lt;P&gt;"C:\Program Files (x86)\IntelSWTools\VTune Amplifier 2016 for Systems\bin64\amplxe-cl" -collect hpc-performance -data-limit=0 -r c:\work\arinberg -- C:\work\arinberg\Kernels\patternMatching\PatternMatching\Debug\PatternMatching.exe C:\work\arinberg\Kernels\patternMatching\Group6 C:\work\arinberg\Kernels\patternMatching\wk1.tcpdump 8 128 1&lt;/P&gt;</description>
      <pubDate>Wed, 10 Aug 2016 08:46:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088932#M15519</guid>
      <dc:creator>Arik_R_Intel</dc:creator>
      <dc:date>2016-08-10T08:46:28Z</dc:date>
    </item>
    <item>
      <title>Ok, I see the point of</title>
      <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088933#M15520</link>
      <description>&lt;P&gt;Ok, I see the point of confusion. There are two VTunes - one "for Systems" and the second is "XE". The HPC Performance Characterization was enabled in VTune Amplifier XE. And 464096 is U3 for Systems if I'm not mistaken.&lt;/P&gt;

&lt;P&gt;I know that HPC Performance Characterization&amp;nbsp;will be added to VTune Amplifier for Systems in 2017 Gold only.&lt;/P&gt;

&lt;P&gt;Anyway - you can find also&amp;nbsp;the metrics on OpenMP efficiency (works&amp;nbsp;for Intel OpenMP) in Advanced Hotspots as a special section&amp;nbsp;on summary and&amp;nbsp;on&amp;nbsp; bottom up pane grid if you choose /OpenMP Regions/.. grouping&lt;/P&gt;

&lt;P&gt;You can also read the following topic on this: &lt;A href="https://software.intel.com/en-us/node/544172"&gt;https://software.intel.com/en-us/node/544172&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;Thanks &amp;amp; Regards, Dmitry&lt;/P&gt;</description>
      <pubDate>Wed, 10 Aug 2016 09:39:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088933#M15520</guid>
      <dc:creator>Dmitry_P_Intel1</dc:creator>
      <dc:date>2016-08-10T09:39:02Z</dc:date>
    </item>
    <item>
      <title>I got XE now. The HPC is now</title>
      <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088934#M15521</link>
      <description>&lt;P&gt;I got XE now. The HPC is now available. Thanks!&lt;/P&gt;

&lt;P&gt;Would there be a problem with OMP analysis if I'm compiling in VS 2015 using Microsoft compiler?&lt;/P&gt;</description>
      <pubDate>Wed, 10 Aug 2016 13:21:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088934#M15521</guid>
      <dc:creator>Arik_R_Intel</dc:creator>
      <dc:date>2016-08-10T13:21:33Z</dc:date>
    </item>
    <item>
      <title>Microsoft compiler doesn't</title>
      <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088935#M15522</link>
      <description>&lt;P&gt;Microsoft compiler doesn't insert specific identification of OpenMP regions as ICL does.&amp;nbsp; You could run against the Intel libiomp5 in place of the Microsoft OpenMP library.&amp;nbsp; You may need to do this in order to set affinity for repeatable results in VTune.&lt;/P&gt;</description>
      <pubDate>Wed, 10 Aug 2016 13:53:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088935#M15522</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2016-08-10T13:53:18Z</dc:date>
    </item>
    <item>
      <title>Hello Arik,</title>
      <link>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088936#M15523</link>
      <description>&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&lt;EM&gt;&lt;STRONG&gt;Hello Arik,&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&lt;EM&gt;&lt;STRONG&gt;On summary CPI is counted by the whole workload so all the cycles consumed by your application (on any core) divided by number of the application instructions (on any core). By CPI you will not be able to define how much instructions were executed per core. But if in your example we assume that cores did equal work with the same efficiency then you will have X/4&amp;nbsp;per core if cumulative number of clockticks&amp;nbsp;is X and CPI=1 as far as I understand.&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;

&lt;P style="word-wrap: break-word; font-size: 12px;"&gt;&lt;EM&gt;&lt;STRONG&gt;Thanks &amp;amp; Regards, Dmitry&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Aug 2016 20:10:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Understanding-Advanced-Hotspots-CPI/m-p/1088936#M15523</guid>
      <dc:creator>Islam_A_</dc:creator>
      <dc:date>2016-08-10T20:10:23Z</dc:date>
    </item>
  </channel>
</rss>

