<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to force AVX-2 vs AVX-512 in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-force-AVX-2-vs-AVX-512/m-p/1169427#M28443</link>
    <description>&lt;P&gt;&lt;SPAN style="font-family: Arial; font-size: 13px; -webkit-text-stroke: rgb(83, 87, 94);"&gt;Hello,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;I'm running benchmarks of my code on test hardware (Intel Xeon Gold 5115), and i’m trying to isolate the impact of avx-512 vs avx-2 instructions on overall runtime. My issue is, I don’t know whether or not I’m forcing my code (compiled with icc 2018.1.163 + MKL) to use either instruction set. For reference (I can’t paste our entire codeset here, too long), the code is linear algebra heavy, and has used Intel MKL libraries via gsl_cblas_* calls, where GSL is also compiled with icc+MKL. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;Here’s the build scenario:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;My avx-2 code build is built on Intel Skylake (E3-1240 v5) hardware, with the following set of compiler flags: &lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;CFLAGS=“-O3 -xcore-avx2 -I/ldcg/intel/2018u1/compilers_and_libraries_2018.1.163/linux/mkl/include”
LDFLAGS=“-L/ldcg/intel/2018u1/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64 -lmkl_rt -lpthread -lm -ldl"&lt;/PRE&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;My avx-512 build is built on Xeon Gold 5115 hardware, with the following set of compiler flags:&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;CFLAGS="-O3 -xCORE-AVX512 -I/ldcg/intel/2018u1/compilers_and_libraries_2018.1.163/linux/mkl/include”
LDFLAGS="-L/ldcg/intel/2018u1/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64 -lmkl_rt -lpthread -lm -ldl"&lt;/PRE&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;Okay, so here are my scenarios. I’m using perf to see which system images are being used (maybe this isn’t the best way, but I’m open to other suggestions). A couple years back, I was able to directly count instruction cycles, but I think that functionality was depreciated after sandy bridge. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;1. Running the avx-2 code on Skylake hardware: the primary overhead in perf was libmkl_avx2.so. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;2. Running the avx-512 code on Skylake hardware: Illegal Instruction. This was expected, since the code was specifically evoking instructions that the CPU didn’t support. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;Now, when I ran the avx-2 and the avx-512 code on the Xeon Gold 5115 machine, I see almost the exact same runtime (~.05% differences). Further, perf is reporting that the primary overhead in libmkl_512.so in both cases. When I performed a similar study going from sandy bridge—&amp;gt; haswell, I saw overall 20% speedup, so I would expect to see *some* sort of differences. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;Right now, it seems like either I’m not properly compiling to use avx-512 instructions, or the compiler is forcing both codes (-xcore-avx2 and -xcore-avx512) to use avx512.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;Here’s my question(s):&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;1. Am I going about this in an inefficient way? What would be a more efficient way to directly confirm which set of instructions are being evoked?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;2. Is there something that I’m missing about compiling and forcing avx-2 vs avx-512 instructions? &lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p2"&gt;&lt;SPAN style="-webkit-text-stroke-width: initial;"&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;Alex&lt;/SPAN&gt;&lt;/P&gt;
&lt;STYLE type="text/css"&gt;p.p1 {margin: 0.0px 0.0px 19.5px 0.0px; line-height: 15.0px; font: 13.0px Arial; color: #53575e; -webkit-text-stroke: #53575e}
p.p2 {margin: 0.0px 0.0px 19.5px 0.0px; line-height: 15.0px; font: 13.0px Arial; color: #53575e; -webkit-text-stroke: #53575e; min-height: 15.0px}
span.s1 {font-kerning: none}
&lt;/STYLE&gt;</description>
    <pubDate>Fri, 13 Apr 2018 18:15:27 GMT</pubDate>
    <dc:creator>Alexander_P_</dc:creator>
    <dc:date>2018-04-13T18:15:27Z</dc:date>
    <item>
      <title>How to force AVX-2 vs AVX-512</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-force-AVX-2-vs-AVX-512/m-p/1169427#M28443</link>
      <description>&lt;P&gt;&lt;SPAN style="font-family: Arial; font-size: 13px; -webkit-text-stroke: rgb(83, 87, 94);"&gt;Hello,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;I'm running benchmarks of my code on test hardware (Intel Xeon Gold 5115), and i’m trying to isolate the impact of avx-512 vs avx-2 instructions on overall runtime. My issue is, I don’t know whether or not I’m forcing my code (compiled with icc 2018.1.163 + MKL) to use either instruction set. For reference (I can’t paste our entire codeset here, too long), the code is linear algebra heavy, and has used Intel MKL libraries via gsl_cblas_* calls, where GSL is also compiled with icc+MKL. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;Here’s the build scenario:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;My avx-2 code build is built on Intel Skylake (E3-1240 v5) hardware, with the following set of compiler flags: &lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;CFLAGS=“-O3 -xcore-avx2 -I/ldcg/intel/2018u1/compilers_and_libraries_2018.1.163/linux/mkl/include”
LDFLAGS=“-L/ldcg/intel/2018u1/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64 -lmkl_rt -lpthread -lm -ldl"&lt;/PRE&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;My avx-512 build is built on Xeon Gold 5115 hardware, with the following set of compiler flags:&lt;/SPAN&gt;&lt;/P&gt;

&lt;PRE class="brush:plain;"&gt;CFLAGS="-O3 -xCORE-AVX512 -I/ldcg/intel/2018u1/compilers_and_libraries_2018.1.163/linux/mkl/include”
LDFLAGS="-L/ldcg/intel/2018u1/compilers_and_libraries_2018.1.163/linux/mkl/lib/intel64 -lmkl_rt -lpthread -lm -ldl"&lt;/PRE&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;Okay, so here are my scenarios. I’m using perf to see which system images are being used (maybe this isn’t the best way, but I’m open to other suggestions). A couple years back, I was able to directly count instruction cycles, but I think that functionality was depreciated after sandy bridge. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;1. Running the avx-2 code on Skylake hardware: the primary overhead in perf was libmkl_avx2.so. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;2. Running the avx-512 code on Skylake hardware: Illegal Instruction. This was expected, since the code was specifically evoking instructions that the CPU didn’t support. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;Now, when I ran the avx-2 and the avx-512 code on the Xeon Gold 5115 machine, I see almost the exact same runtime (~.05% differences). Further, perf is reporting that the primary overhead in libmkl_512.so in both cases. When I performed a similar study going from sandy bridge—&amp;gt; haswell, I saw overall 20% speedup, so I would expect to see *some* sort of differences. &lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;Right now, it seems like either I’m not properly compiling to use avx-512 instructions, or the compiler is forcing both codes (-xcore-avx2 and -xcore-avx512) to use avx512.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;Here’s my question(s):&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;1. Am I going about this in an inefficient way? What would be a more efficient way to directly confirm which set of instructions are being evoked?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;2. Is there something that I’m missing about compiling and forcing avx-2 vs avx-512 instructions? &lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p2"&gt;&lt;SPAN style="-webkit-text-stroke-width: initial;"&gt;Thanks,&lt;/SPAN&gt;&lt;/P&gt;

&lt;P class="p1"&gt;&lt;SPAN class="s1"&gt;Alex&lt;/SPAN&gt;&lt;/P&gt;
&lt;STYLE type="text/css"&gt;p.p1 {margin: 0.0px 0.0px 19.5px 0.0px; line-height: 15.0px; font: 13.0px Arial; color: #53575e; -webkit-text-stroke: #53575e}
p.p2 {margin: 0.0px 0.0px 19.5px 0.0px; line-height: 15.0px; font: 13.0px Arial; color: #53575e; -webkit-text-stroke: #53575e; min-height: 15.0px}
span.s1 {font-kerning: none}
&lt;/STYLE&gt;</description>
      <pubDate>Fri, 13 Apr 2018 18:15:27 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-force-AVX-2-vs-AVX-512/m-p/1169427#M28443</guid>
      <dc:creator>Alexander_P_</dc:creator>
      <dc:date>2018-04-13T18:15:27Z</dc:date>
    </item>
    <item>
      <title>You may try MKL_ENABLE</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-force-AVX-2-vs-AVX-512/m-p/1169428#M28444</link>
      <description>&lt;P&gt;You may try&amp;nbsp;&lt;SPAN class="fontstyle0"&gt;MKL_ENABLE_INSTRUCTIONS&amp;nbsp; environment variable and don't care about specific compiler option. Please refer to the UserGuide to see how to use this.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 14 Apr 2018 06:37:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/How-to-force-AVX-2-vs-AVX-512/m-p/1169428#M28444</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2018-04-14T06:37:58Z</dc:date>
    </item>
  </channel>
</rss>

