<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Both the vectorized and in Analyzers</title>
    <link>https://community.intel.com/t5/Analyzers/Packed-non-vectorized-FP-operations/m-p/1161661#M17768</link>
    <description>&lt;P&gt;Both the vectorized and scalar instructions use the same registers, but the "scalar" versions of the instructions only use one "lane" of data (and only one "lane" of the corresponding arithmetic functional unit(s)).&lt;/P&gt;&lt;P&gt;Vectorization can be inhibited by a number of issues. &amp;nbsp;You should start by adding "-qopt-report=3 -qopt-report-phase=vec" to the compile flags and reviewing the resulting ".optrpt" file(s). &amp;nbsp; You will probably want to try several different report levels (1-5, I picked 3 to start) -- they provide increasing information with increasing level. &amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Operations that can't be vectorized are sometimes easy to fix (e.g., possible aliasing), sometimes hard to fix (e.g., related to data structure layout), and sometimes impossible to fix (e.g., data dependence as an&amp;nbsp;essential characteristic&amp;nbsp;of the algorithm in use). &amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 14 Apr 2020 15:24:26 GMT</pubDate>
    <dc:creator>McCalpinJohn</dc:creator>
    <dc:date>2020-04-14T15:24:26Z</dc:date>
    <item>
      <title>Packed non-vectorized FP operations</title>
      <link>https://community.intel.com/t5/Analyzers/Packed-non-vectorized-FP-operations/m-p/1161660#M17767</link>
      <description>&lt;P&gt;I am using vtune 2020u0 on intel 8280 platform. I carried out an HPC characterization analysis&amp;nbsp;&amp;nbsp;and was looking at the Heading of Vectorization&amp;nbsp; Section which has&lt;/P&gt;
&lt;PRE class="brush:bash; class-name:dark;"&gt;Vectorization:	77.7% of Packed FP Operations
    Instruction Mix:	
    SP FLOPs:	15.4%
    Packed:	79.8%
    128-bit:	0.0%
    256-bit:	0.1%
    512-bit:	79.8%
    Scalar:	20.2%
    DP FLOPs:	0.4%
    x87 FLOPs:	0.0%
    Non-FP:	84.2%
    FP Arith/Mem Rd Instr. Ratio:	0.462
    FP Arith/Mem Wr Instr. Ratio:	1.369&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;-&amp;nbsp;&lt;BR /&gt;checked for a detailed explanation &lt;A href="https://software.intel.com/en-us/vtune-help-of-packed-fp-instructions"&gt;here&amp;nbsp;&lt;/A&gt; , but was unable to gain clarity so asking my queries here.&lt;BR /&gt;From report it seems code issued packed +&amp;nbsp;non packed instructions&amp;nbsp;and, out of&amp;nbsp;all the packed FP instructions issued&amp;nbsp;during code execution,&amp;nbsp;only 77.7% were vectorized -&amp;nbsp;Which (AFAIK) means&amp;nbsp;these instructions resulted in use of AVX/AVX2/AVX512 bit registers.&lt;/P&gt;
&lt;P&gt;Could you please explain / refer me to an article which explains&amp;nbsp;the (general) reasons&amp;nbsp;for&amp;nbsp; non-vectorization of (in my case -&amp;nbsp;22.3% of packed instructions) packed instructions? and how&amp;nbsp;these &amp;nbsp;packed instructions would execute (using scalar registers?)?&lt;/P&gt;
&lt;P&gt;For example -&amp;nbsp;mm256_add_ps is a packed instruction,&amp;nbsp; so could you help me in understanding that how the&amp;nbsp; add operation could&amp;nbsp;be non-vectorized in following context -&lt;/P&gt;

&lt;PRE class="brush:cpp; class-name:dark;"&gt;float f[8]={1.0,2.0,1.2,2.1, 5.2,5.3,10.1,11.0};
__m256 v=_mm256_load_ps(&amp;amp;f[0]);
v=_mm256_add_ps(v,v);&lt;/PRE&gt;

&lt;P&gt;The aforementioned code is not related to the code which i have profiled.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Apr 2020 10:54:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Packed-non-vectorized-FP-operations/m-p/1161660#M17767</guid>
      <dc:creator>psing51</dc:creator>
      <dc:date>2020-04-14T10:54:19Z</dc:date>
    </item>
    <item>
      <title>Both the vectorized and</title>
      <link>https://community.intel.com/t5/Analyzers/Packed-non-vectorized-FP-operations/m-p/1161661#M17768</link>
      <description>&lt;P&gt;Both the vectorized and scalar instructions use the same registers, but the "scalar" versions of the instructions only use one "lane" of data (and only one "lane" of the corresponding arithmetic functional unit(s)).&lt;/P&gt;&lt;P&gt;Vectorization can be inhibited by a number of issues. &amp;nbsp;You should start by adding "-qopt-report=3 -qopt-report-phase=vec" to the compile flags and reviewing the resulting ".optrpt" file(s). &amp;nbsp; You will probably want to try several different report levels (1-5, I picked 3 to start) -- they provide increasing information with increasing level. &amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Operations that can't be vectorized are sometimes easy to fix (e.g., possible aliasing), sometimes hard to fix (e.g., related to data structure layout), and sometimes impossible to fix (e.g., data dependence as an&amp;nbsp;essential characteristic&amp;nbsp;of the algorithm in use). &amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Apr 2020 15:24:26 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Packed-non-vectorized-FP-operations/m-p/1161661#M17768</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2020-04-14T15:24:26Z</dc:date>
    </item>
    <item>
      <title>Hello,</title>
      <link>https://community.intel.com/t5/Analyzers/Packed-non-vectorized-FP-operations/m-p/1161662#M17769</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;Just to avoid confusions - 77.7% of packed instructions of all precision types are vector instructions.&lt;/P&gt;&lt;P&gt;The metrics hierarchy supposed to be:&lt;/P&gt;&lt;P&gt;Vectorization:&amp;nbsp; 77.7% of Packed FP Operations&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Instruction Mix:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SP FLOPs:&amp;nbsp;&amp;nbsp; 15.4%&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Packed: 79.8%&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 128-bit:&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.0%&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;256-bit:&amp;nbsp;&amp;nbsp;&amp;nbsp; 0.1%&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 512-bit:&amp;nbsp;&amp;nbsp;&amp;nbsp; 79.8%&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Scalar: 20.2%&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; DP FLOPs:&amp;nbsp;&amp;nbsp; 0.4%&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp; x87 FLOPs:&amp;nbsp; 0.0%&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Non-FP: 84.2%&lt;/P&gt;&lt;P&gt;By some reason DP FLOPs are not broken down by packed and scalar in your case.&lt;/P&gt;&lt;P&gt;To see why 22.3% of scalar instructions are not vectorized you can use Intel Advisor.&lt;/P&gt;&lt;P&gt;Thank you, Regards, Dmitry&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Apr 2020 15:26:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Packed-non-vectorized-FP-operations/m-p/1161662#M17769</guid>
      <dc:creator>Dmitry_P_Intel1</dc:creator>
      <dc:date>2020-04-14T15:26:11Z</dc:date>
    </item>
    <item>
      <title>Hi puneet,</title>
      <link>https://community.intel.com/t5/Analyzers/Packed-non-vectorized-FP-operations/m-p/1161663#M17770</link>
      <description>&lt;P&gt;Hi puneet,&lt;/P&gt;&lt;P&gt;Was the solution provided helpful?&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 17 Apr 2020 13:49:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Packed-non-vectorized-FP-operations/m-p/1161663#M17770</guid>
      <dc:creator>JananiC_Intel</dc:creator>
      <dc:date>2020-04-17T13:49:05Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Analyzers/Packed-non-vectorized-FP-operations/m-p/1161664#M17771</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;We are closing this case by assuming that your issue got resolved. Please feel free to raise a new thread if you have further issues.&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 24 Apr 2020 12:12:53 GMT</pubDate>
      <guid>https://community.intel.com/t5/Analyzers/Packed-non-vectorized-FP-operations/m-p/1161664#M17771</guid>
      <dc:creator>JananiC_Intel</dc:creator>
      <dc:date>2020-04-24T12:12:53Z</dc:date>
    </item>
  </channel>
</rss>

