<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Using FMA in MKL routines in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Using-FMA-in-MKL-routines/m-p/1151867#M27202</link>
    <description>&lt;P&gt;Hey everyone,&lt;/P&gt;&lt;P&gt;I couldn't find any old topics that dealt with this question in detail, so here I am asking it again: is there a way to enable FMA math when using the MKL routines? Here is a sample routine that when run on MSVC 2017 with the latest MKL version (details in the output below) and an AVX2 processor DOES NOT use FMA:&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;void print_mkl_info() {
    MKLVersion Version;
    mkl_get_version(&amp;amp;Version);
    printf("Major version:           %d\n",Version.MajorVersion);
    printf("Minor version:           %d\n",Version.MinorVersion);
    printf("Update version:          %d\n",Version.UpdateVersion);
    printf("Product status:          %s\n",Version.ProductStatus);
    printf("Build:                   %s\n",Version.Build);
    printf("Platform:                %s\n",Version.Platform);
    printf("Processor optimization:  %s\n",Version.Processor);
    printf("================================================================\n");
    printf("\n");
}

float standard_dot_product(float* a, float* b) {
    float c = 0.0f;
    for (int i = 0; i &amp;lt; 4; i++) {
        c = c + (a&lt;I&gt; * b&lt;I&gt;);
    }
    return c;
}

float standard_fma_dot_product(float* a, float* b) {
    float c = 0.0f;
    for (int i = 0; i &amp;lt; 4; i++) {
        c = fmaf(a&lt;I&gt;, b&lt;I&gt;, c);
    }
    return c;
}

float mkl_dot_product(float* a, float* b) {
    return cblas_sdot(4, a, 1, b, 1);
}

int main() {
    print_mkl_info();
    float a[4] = { 1.907607, -.7862027, 1.148311, .9604002 };
    float b[4] = { -.9355000, -.6915108, 1.724470, -.7097529 };
    printf("Standard dot product is:     %.23f\n", standard_dot_product(a, b));
    printf("Standard FMA dot product is: %.23f\n", standard_fma_dot_product(a, b));
    printf("MKL dot product is:          %.23f\n", mkl_dot_product(a, b));
    return 0;
}&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;The above program outputs (compiled with FP:FAST and O2. Note that changing O2 to O1 changes the result of the standard_dot_product function, but not of the CBLAS routine):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:plain; class-name:dark;"&gt;Major version:           2019
Minor version:           0
Update version:          2
Product status:          Product
Build:                   20190118
Platform:                32-bit
Processor optimization:  Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================

Standard dot product is:     0.05768233537673950195313
Standard FMA dot product is: 0.05768235772848129272461
MKL dot product is:          0.05768233537673950195313&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So is there anyway to generate results with FMA in such cases? Or am I being a knobhead and missing something?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;THANKS!&lt;/P&gt;
&lt;P&gt;Swat&lt;/P&gt;</description>
    <pubDate>Wed, 13 Feb 2019 15:03:07 GMT</pubDate>
    <dc:creator>Kat__Swat</dc:creator>
    <dc:date>2019-02-13T15:03:07Z</dc:date>
    <item>
      <title>Using FMA in MKL routines</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Using-FMA-in-MKL-routines/m-p/1151867#M27202</link>
      <description>&lt;P&gt;Hey everyone,&lt;/P&gt;&lt;P&gt;I couldn't find any old topics that dealt with this question in detail, so here I am asking it again: is there a way to enable FMA math when using the MKL routines? Here is a sample routine that when run on MSVC 2017 with the latest MKL version (details in the output below) and an AVX2 processor DOES NOT use FMA:&lt;/P&gt;
&lt;PRE class="brush:cpp; class-name:dark;"&gt;void print_mkl_info() {
    MKLVersion Version;
    mkl_get_version(&amp;amp;Version);
    printf("Major version:           %d\n",Version.MajorVersion);
    printf("Minor version:           %d\n",Version.MinorVersion);
    printf("Update version:          %d\n",Version.UpdateVersion);
    printf("Product status:          %s\n",Version.ProductStatus);
    printf("Build:                   %s\n",Version.Build);
    printf("Platform:                %s\n",Version.Platform);
    printf("Processor optimization:  %s\n",Version.Processor);
    printf("================================================================\n");
    printf("\n");
}

float standard_dot_product(float* a, float* b) {
    float c = 0.0f;
    for (int i = 0; i &amp;lt; 4; i++) {
        c = c + (a&lt;I&gt; * b&lt;I&gt;);
    }
    return c;
}

float standard_fma_dot_product(float* a, float* b) {
    float c = 0.0f;
    for (int i = 0; i &amp;lt; 4; i++) {
        c = fmaf(a&lt;I&gt;, b&lt;I&gt;, c);
    }
    return c;
}

float mkl_dot_product(float* a, float* b) {
    return cblas_sdot(4, a, 1, b, 1);
}

int main() {
    print_mkl_info();
    float a[4] = { 1.907607, -.7862027, 1.148311, .9604002 };
    float b[4] = { -.9355000, -.6915108, 1.724470, -.7097529 };
    printf("Standard dot product is:     %.23f\n", standard_dot_product(a, b));
    printf("Standard FMA dot product is: %.23f\n", standard_fma_dot_product(a, b));
    printf("MKL dot product is:          %.23f\n", mkl_dot_product(a, b));
    return 0;
}&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/I&gt;&lt;/PRE&gt;

&lt;P&gt;The above program outputs (compiled with FP:FAST and O2. Note that changing O2 to O1 changes the result of the standard_dot_product function, but not of the CBLAS routine):&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:plain; class-name:dark;"&gt;Major version:           2019
Minor version:           0
Update version:          2
Product status:          Product
Build:                   20190118
Platform:                32-bit
Processor optimization:  Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors
================================================================

Standard dot product is:     0.05768233537673950195313
Standard FMA dot product is: 0.05768235772848129272461
MKL dot product is:          0.05768233537673950195313&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So is there anyway to generate results with FMA in such cases? Or am I being a knobhead and missing something?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;THANKS!&lt;/P&gt;
&lt;P&gt;Swat&lt;/P&gt;</description>
      <pubDate>Wed, 13 Feb 2019 15:03:07 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Using-FMA-in-MKL-routines/m-p/1151867#M27202</guid>
      <dc:creator>Kat__Swat</dc:creator>
      <dc:date>2019-02-13T15:03:07Z</dc:date>
    </item>
    <item>
      <title>Your CPU already supports FMA</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Using-FMA-in-MKL-routines/m-p/1151868#M27203</link>
      <description>&lt;P&gt;Your CPU already supports FMA instructions because of AVX2 code branch has been called.&lt;/P&gt;&lt;P&gt;You may also try to play&amp;nbsp;mkl_enable_instructions(int) to dispatch for another instruction sets.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Feb 2019 03:39:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Using-FMA-in-MKL-routines/m-p/1151868#M27203</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2019-02-14T03:39:03Z</dc:date>
    </item>
    <item>
      <title>Quote:Gennady F. (Intel)</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Using-FMA-in-MKL-routines/m-p/1151869#M27204</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Gennady F. (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Your CPU already supports FMA instructions because of AVX2 code branch has been called.&lt;/P&gt;&lt;P&gt;You may also try to play&amp;nbsp;mkl_enable_instructions(int) to dispatch for another instruction sets.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You were right; my processor does have FMA support, but it looks like that branch is called only when compiled under 64-bit mode. There is a slight difference in the answers though (printed out as integer values for easy comparison, compiled on MSVC 19.11.25507.1 for x64, with /arch:AVX2 and /O2):&lt;/P&gt;
&lt;PRE class="brush:plain; class-name:dark;"&gt;Standard dot product is:          1030505552
Standard FMA dot product is:      1030505558
MKL dot product is:               1030505568&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Would you happen to know why this difference occurs?&lt;/P&gt;</description>
      <pubDate>Wed, 13 Mar 2019 16:31:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Using-FMA-in-MKL-routines/m-p/1151869#M27204</guid>
      <dc:creator>Kat__Swat</dc:creator>
      <dc:date>2019-03-13T16:31:00Z</dc:date>
    </item>
  </channel>
</rss>

