<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic In addition, the MKL in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Slight-Discrepancies-in-AXPY-results-vs-explicit-loops/m-p/1156546#M27628</link>
    <description>&lt;P&gt;In addition, the MKL extension XAXPBY exhibits some similar behavior.&lt;/P&gt;</description>
    <pubDate>Wed, 20 Feb 2019 20:33:21 GMT</pubDate>
    <dc:creator>John_Young</dc:creator>
    <dc:date>2019-02-20T20:33:21Z</dc:date>
    <item>
      <title>Slight Discrepancies in AXPY results vs explicit loops</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Slight-Discrepancies-in-AXPY-results-vs-explicit-loops/m-p/1156545#M27627</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;In some of our codes, we have noticed some slight discrepancies in the results when using MKL AXPY calls and our own explicit loops to calculate Y=Y+ALPHA*X.&amp;nbsp; When we are performing iterative matrix solutions, the final solutions using AXPY can be significantly worse than the solutions using explicit loops instead of AXPY.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've attached a test case under 64-bit MS Windows using Intel MKL 2019 update 2.&amp;nbsp; For some vectors and scale factors, the results are identical down to machine precision. For other vector and scale factor combinations, you can barely maintain the specified floating point precision.&amp;nbsp; In the attached screen shot, the left column of numbers shows the RMS error between the AXPY result and the explicit loop result for two cases.&amp;nbsp; In the first case, the two results are different and in the second case no difference is discernible between the two results.&amp;nbsp; Since AXPY loops have no dependencies between vector entries, it does not seem like it could be a threading issue.&amp;nbsp; While we are aware that floating point operations are always tricky, we don't understand why independent operations of y(i) = y(i) + alpha*x(i) are returning such different results.&lt;/P&gt;&lt;P&gt;We've tried the various suggestions from the documentation about getting repeatable floating point solutions to no avail.&amp;nbsp; Any advice would be much appreciated.&amp;nbsp; Is this just to be expected or is there some other MKL issue going on?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;John&lt;/P&gt;</description>
      <pubDate>Wed, 20 Feb 2019 20:32:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Slight-Discrepancies-in-AXPY-results-vs-explicit-loops/m-p/1156545#M27627</guid>
      <dc:creator>John_Young</dc:creator>
      <dc:date>2019-02-20T20:32:37Z</dc:date>
    </item>
    <item>
      <title>In addition, the MKL</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Slight-Discrepancies-in-AXPY-results-vs-explicit-loops/m-p/1156546#M27628</link>
      <description>&lt;P&gt;In addition, the MKL extension XAXPBY exhibits some similar behavior.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Feb 2019 20:33:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Slight-Discrepancies-in-AXPY-results-vs-explicit-loops/m-p/1156546#M27628</guid>
      <dc:creator>John_Young</dc:creator>
      <dc:date>2019-02-20T20:33:21Z</dc:date>
    </item>
    <item>
      <title>Perhaps the compiler is</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Slight-Discrepancies-in-AXPY-results-vs-explicit-loops/m-p/1156547#M27629</link>
      <description>&lt;P&gt;Perhaps the compiler is generating a separate multiply followed by an&amp;nbsp;add, each rounded, versus Intel MKL which may use a fused multiply-add instruction with a single rounding. Intel AVX2 and later architectures support a fused-multiply add instruction and the compiler options used can impact whether or not it is generated.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Feb 2019 20:54:37 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Slight-Discrepancies-in-AXPY-results-vs-explicit-loops/m-p/1156547#M27629</guid>
      <dc:creator>Shane_S_Intel</dc:creator>
      <dc:date>2019-02-20T20:54:37Z</dc:date>
    </item>
    <item>
      <title>Is there a way for us to test</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Slight-Discrepancies-in-AXPY-results-vs-explicit-loops/m-p/1156548#M27630</link>
      <description>&lt;P&gt;Is there a way for us to test this out, i.e., what compiler options could toggle this?&amp;nbsp;&lt;/P&gt;&lt;P&gt;If this is the case, we are getting better numerical error in our iterative solutions using the explicit results than with the MKL AXPY call.&lt;/P&gt;</description>
      <pubDate>Wed, 20 Feb 2019 20:57:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Slight-Discrepancies-in-AXPY-results-vs-explicit-loops/m-p/1156548#M27630</guid>
      <dc:creator>John_Young</dc:creator>
      <dc:date>2019-02-20T20:57:25Z</dc:date>
    </item>
    <item>
      <title>John,</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Slight-Discrepancies-in-AXPY-results-vs-explicit-loops/m-p/1156549#M27631</link>
      <description>&lt;P&gt;John,&lt;/P&gt;&lt;P&gt;strict should work. Or, if you only want to disable FMA, /Qfma- should do it.&lt;/P&gt;&lt;P&gt;From the icc fma page (https://software.intel.com/en-us/cpp-compiler-developer-guide-and-reference-fma-qfma):&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;P&gt;-fma&lt;BR /&gt;or/Qfma&lt;/P&gt;&lt;P&gt;If the instructions exist on the target processor, the compiler generates fused multiply-add (FMA) instructions.&lt;/P&gt;&lt;P&gt;However, if you specify&amp;nbsp;-fp-model strict&amp;nbsp;(Linux* OS and OS X*) or&amp;nbsp;/fp:strict&amp;nbsp;(Windows* OS), but do not explicitly specify&amp;nbsp;-fma&amp;nbsp;or&amp;nbsp;/Qfma, the default is&amp;nbsp;-no-fma&amp;nbsp;or&amp;nbsp;/Qfma-.&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;Let me know if that helps.&lt;/P&gt;&lt;P&gt;Pamela&lt;/P&gt;</description>
      <pubDate>Fri, 01 Mar 2019 20:41:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Slight-Discrepancies-in-AXPY-results-vs-explicit-loops/m-p/1156549#M27631</guid>
      <dc:creator>Pamela_H_Intel</dc:creator>
      <dc:date>2019-03-01T20:41:50Z</dc:date>
    </item>
    <item>
      <title>Oh - FORTRAN - I think the</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Slight-Discrepancies-in-AXPY-results-vs-explicit-loops/m-p/1156550#M27632</link>
      <description>&lt;P&gt;Oh - FORTRAN - I think the page is identical, but here's the fma page from the Intel* FORTRAN Compiler 19:&lt;/P&gt;&lt;P&gt;&lt;A href="https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-fma-qfma" target="_blank"&gt;https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-fma-qfma&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 01 Mar 2019 21:07:35 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Slight-Discrepancies-in-AXPY-results-vs-explicit-loops/m-p/1156550#M27632</guid>
      <dc:creator>Pamela_H_Intel</dc:creator>
      <dc:date>2019-03-01T21:07:35Z</dc:date>
    </item>
  </channel>
</rss>

