<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: MKL numerical stability and threading in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877852#M9166</link>
    <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/452406"&gt;dbacchus&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; &lt;BR /&gt;Dima, could you please provide a link or reference to the abovementioned OpenMP specification? It makes a lot of sense, of course:e.g.,if onecalculates a product of many variables (in a parallel loop), the result will depend on the order of multiplication due to the truncation errors.&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
OpenMP is expected to produce numerical variations when reduction operators are in use.&lt;BR /&gt;The quotation appears in OpenMP standard &lt;A href="http://www.openmp.org/mp-documents/spec30.pdf"&gt;http://www.openmp.org/mp-documents/spec30.pdf&lt;/A&gt; pg 98&lt;BR /&gt;</description>
    <pubDate>Tue, 24 Nov 2009 17:19:15 GMT</pubDate>
    <dc:creator>TimP</dc:creator>
    <dc:date>2009-11-24T17:19:15Z</dc:date>
    <item>
      <title>MKL numerical stability and threading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877849#M9163</link>
      <description>Hello&lt;BR /&gt;&lt;BR /&gt;I am wondering how/whether the number of threads affect the numerical results computed by MKL.&lt;BR /&gt;&lt;BR /&gt;Section 8.1 of the MKL (version 10.1) user's guide states that&lt;BR /&gt;&lt;BR /&gt; "With a given Intel MKL version, the outputs will be bit-for-bit identical provided all the following conditions are met:&lt;BR /&gt;  the outputs are obtained on the same platform;&lt;BR /&gt;  the inputs are bit-for-bit identical;&lt;BR /&gt;  the input arrays are aligned identically at 16-byte boundaries."&lt;BR /&gt;&lt;BR /&gt;Does this really mean, that I can link the threaded MKL libraries and vary the value of OMP_NUM_THREADS and still expect bit-for-bit identical results given the above conditions are met?&lt;BR /&gt;&lt;BR /&gt;Thanks for you comments!</description>
      <pubDate>Tue, 24 Nov 2009 13:31:18 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877849#M9163</guid>
      <dc:creator>millidred</dc:creator>
      <dc:date>2009-11-24T13:31:18Z</dc:date>
    </item>
    <item>
      <title>Re: MKL numerical stability and threading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877850#M9164</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;Yet another condition should be met: MKL shall be run in sequential mode. Generally, precise result of threaded algorithmsmay depend not only on the number of threads but also onthe order in whichthe threads are executed. For example, specification of OpenMP states: "...comparing one parallel run to another (even if the number of threads used is the same), there is no guarantee that bit-identical results will be obtained...".&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;Dima&lt;BR /&gt;</description>
      <pubDate>Tue, 24 Nov 2009 15:52:46 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877850#M9164</guid>
      <dc:creator>Dmitry_B_Intel</dc:creator>
      <dc:date>2009-11-24T15:52:46Z</dc:date>
    </item>
    <item>
      <title>Re: MKL numerical stability and threading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877851#M9165</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/93647"&gt;Dmitry Baksheev (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;&lt;BR /&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;Yet another condition should be met: MKL shall be run in sequential mode. Generally, precise result of threaded algorithmsmay depend not only on the number of threads but also onthe order in whichthe threads are executed. For example, specification of OpenMP states: "...comparing one parallel run to another (even if the number of threads used is the same), there is no guarantee that bit-identical results will be obtained...".&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;Dima&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Dima, could you please provide a link or reference to the abovementioned OpenMP specification? It makes a lot of sense, of course:e.g.,if onecalculates a product of many variables (in a parallel loop), the result will depend on the order of multiplication due to the truncation errors.</description>
      <pubDate>Tue, 24 Nov 2009 17:15:03 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877851#M9165</guid>
      <dc:creator>dbacchus</dc:creator>
      <dc:date>2009-11-24T17:15:03Z</dc:date>
    </item>
    <item>
      <title>Re: MKL numerical stability and threading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877852#M9166</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/452406"&gt;dbacchus&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; &lt;BR /&gt;Dima, could you please provide a link or reference to the abovementioned OpenMP specification? It makes a lot of sense, of course:e.g.,if onecalculates a product of many variables (in a parallel loop), the result will depend on the order of multiplication due to the truncation errors.&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
OpenMP is expected to produce numerical variations when reduction operators are in use.&lt;BR /&gt;The quotation appears in OpenMP standard &lt;A href="http://www.openmp.org/mp-documents/spec30.pdf"&gt;http://www.openmp.org/mp-documents/spec30.pdf&lt;/A&gt; pg 98&lt;BR /&gt;</description>
      <pubDate>Tue, 24 Nov 2009 17:19:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877852#M9166</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-11-24T17:19:15Z</dc:date>
    </item>
    <item>
      <title>Re: MKL numerical stability and threading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877853#M9167</link>
      <description>&lt;DIV&gt;&lt;/DIV&gt;
&lt;P&gt;Thanks, tim18!&lt;/P&gt;</description>
      <pubDate>Tue, 24 Nov 2009 17:39:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877853#M9167</guid>
      <dc:creator>dbacchus</dc:creator>
      <dc:date>2009-11-24T17:39:05Z</dc:date>
    </item>
    <item>
      <title>Re: MKL numerical stability and threading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877854#M9168</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
OpenMP is expected to produce numerical variations when reduction operators are in use.&lt;BR /&gt;The quotation appears in OpenMP standard &lt;A href="http://www.openmp.org/mp-documents/spec30.pdf"&gt;http://www.openmp.org/mp-documents/spec30.pdf&lt;/A&gt; pg 98&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
By the way, and off the current topic, much more severe variations are observed in certain implementations of MPI reduction operators. The implementors of Intel MPI (and apparently openmpi) have achieved satisfactory results, so it seems to be treated as a Quality of Implementation issue rather than a standards question. &lt;BR /&gt;Certain hybrid OpenMP/MPI applications have options to bypass OpenMP reduction so as to permit changing the number of threads (but not number of MPI processes), without producing numerical variations. Typically, there is a significant performance penalty involved in avoiding OpenMP or MPI reductions.&lt;BR /&gt;</description>
      <pubDate>Tue, 24 Nov 2009 18:22:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877854#M9168</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-11-24T18:22:49Z</dc:date>
    </item>
    <item>
      <title>Re: MKL numerical stability and threading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877855#M9169</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;By the way, and off the current topic, much more severe variations are observed in certain implementations of MPI reduction operators. The implementors of Intel MPI (and apparently openmpi) have achieved satisfactory results, so it seems to be treated as a Quality of Implementation issue rather than a standards question. &lt;BR /&gt;Certain hybrid OpenMP/MPI applications have options to bypass OpenMP reduction so as to permit changing the number of threads (but not number of MPI processes), without producing numerical variations. Typically, there is a significant performance penalty involved in avoiding OpenMP or MPI reductions.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
It is possible to result a concrete example for OpenMP?&lt;BR /&gt;</description>
      <pubDate>Tue, 24 Nov 2009 19:48:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877855#M9169</guid>
      <dc:creator>yuriisig</dc:creator>
      <dc:date>2009-11-24T19:48:41Z</dc:date>
    </item>
    <item>
      <title>Re: MKL numerical stability and threading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877856#M9170</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/312233"&gt;yuriisig&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
It is possible to result a concrete example for OpenMP?&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
Do you mean an example of a commercial application which offers the user a choice of alternate code paths with or without OpenMP reduction? LS-DYNA/SMP and LS-DYNA/hybrid offer such options.&lt;BR /&gt;</description>
      <pubDate>Tue, 24 Nov 2009 20:04:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877856#M9170</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-11-24T20:04:47Z</dc:date>
    </item>
    <item>
      <title>Re: MKL numerical stability and threading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877857#M9171</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/367365"&gt;tim18&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;LS-DYNA/SMP and LS-DYNA/hybrid offer such options.&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Easier LS-DYNA something exists?</description>
      <pubDate>Tue, 24 Nov 2009 21:19:50 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877857#M9171</guid>
      <dc:creator>yuriisig</dc:creator>
      <dc:date>2009-11-24T21:19:50Z</dc:date>
    </item>
    <item>
      <title>Re: MKL numerical stability and threading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877858#M9172</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
Do you mean how does the source code give a choice of OpenMP reduction or no reduction?&lt;BR /&gt;// read user option, set/reset omp_reduction_ok&lt;BR /&gt;#pragma omp for reduction(+:sumall) if(omp_reduction_ok)&lt;BR /&gt;...&lt;BR /&gt;&lt;BR /&gt;Compiler options which prevent vectorized sum reduction might also be set, if there is no control over data alignment, e.g. /fp:source for Intel Windows compilers, omit -ffast-math for gnu compilers. There is no need for alignment dependent code on recent CPUs like Barcelona, Core i7, .... but compilers tend to do it so as to optimize for earlier CPUs.&lt;BR /&gt;</description>
      <pubDate>Tue, 24 Nov 2009 22:01:21 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877858#M9172</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-11-24T22:01:21Z</dc:date>
    </item>
    <item>
      <title>Re: MKL numerical stability and threading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877859#M9173</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/93647"&gt;Dmitry Baksheev (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt; &lt;BR /&gt;Hello,&lt;BR /&gt;&lt;BR /&gt;Yet another condition should be met: MKL shall be run in sequential mode. Generally, precise result of threaded algorithmsmay depend not only on the number of threads but also onthe order in whichthe threads are executed. For example, specification of OpenMP states: "...comparing one parallel run to another (even if the number of threads used is the same), there is no guarantee that bit-identical results will be obtained...".&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;Dima&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Hi Dima&lt;BR /&gt;&lt;BR /&gt;It's strange... Using MKL 10.1 on intel64 I seem to get bit-identical results for 1 to 4 threads. I have tested e.g. the dnrm2 and dgemm BLAS functions.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Roman&lt;BR /&gt;</description>
      <pubDate>Wed, 25 Nov 2009 09:52:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877859#M9173</guid>
      <dc:creator>millidred</dc:creator>
      <dc:date>2009-11-25T09:52:56Z</dc:date>
    </item>
    <item>
      <title>Re: MKL numerical stability and threading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877860#M9174</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/453720"&gt;millidred&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;It's strange... Using MKL 10.1 on intel64 I seem to get bit-identical results for 1 to 4 threads. I have tested e.g. the dnrm2 and dgemm BLAS functions.&lt;BR /&gt;&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
dgemm may not require reduction operators, when implemented efficiently. The reduction-like operation in dnrm2 may not involve roundoff variations with order of operations, even if it is OpenMP threaded. Anyway, it may be difficult to expose and test all opportunities for the variations which OpenMP standard warns about; this tells you only that there is no guarantee.&lt;BR /&gt;</description>
      <pubDate>Wed, 25 Nov 2009 14:26:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877860#M9174</guid>
      <dc:creator>TimP</dc:creator>
      <dc:date>2009-11-25T14:26:36Z</dc:date>
    </item>
    <item>
      <title>Re: MKL numerical stability and threading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877861#M9175</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="margin-top: 5px; width: 100%;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/453720"&gt;millidred&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Hi Dima&lt;BR /&gt;&lt;BR /&gt;It's strange... Using MKL 10.1 on intel64 I seem to get bit-identical results for 1 to 4 threads. I have tested e.g. the dnrm2 and dgemm BLAS functions.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Roman&lt;BR /&gt;&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Hi Roman,&lt;BR /&gt;&lt;BR /&gt;You've been lucky to get bit-to-bit reproducible results in your dgemm tests.Function dnrm2 is not parallel in that version of MKL, so no surprise. I attach a dgemm tests that would fail on if run long enough.&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;Dima</description>
      <pubDate>Wed, 25 Nov 2009 17:03:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877861#M9175</guid>
      <dc:creator>Dmitry_B_Intel</dc:creator>
      <dc:date>2009-11-25T17:03:56Z</dc:date>
    </item>
    <item>
      <title>Re: MKL numerical stability and threading</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877862#M9176</link>
      <description>&lt;DIV style="margin:0px;"&gt;
&lt;DIV id="quote_reply" style="width: 100%; margin-top: 5px;"&gt;
&lt;DIV style="margin-left:2px;margin-right:2px;"&gt;Quoting - &lt;A href="https://community.intel.com/en-us/profile/93647"&gt;Dmitry Baksheev (Intel)&lt;/A&gt;&lt;/DIV&gt;
&lt;DIV style="background-color:#E5E5E5; padding:5px;border: 1px; border-style: inset;margin-left:2px;margin-right:2px;"&gt;&lt;EM&gt;
&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;BR /&gt;Hi Roman,&lt;BR /&gt;&lt;BR /&gt;You've been lucky to get bit-to-bit reproducible results in your dgemm tests.Function dnrm2 is not parallel in that version of MKL, so no surprise. I attach a dgemm tests that would fail on if run long enough.&lt;BR /&gt;&lt;BR /&gt;Thanks&lt;BR /&gt;Dima&lt;/EM&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;BR /&gt;Hi Dima&lt;BR /&gt;&lt;BR /&gt;Thanks for your test code. It clearly shows, that the parallel dgemm routine does not produce bit-to-bit identical results. I must have been lucky indeed in my tests.&lt;BR /&gt;&lt;BR /&gt;Regards,&lt;BR /&gt;Roman&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Thu, 26 Nov 2009 14:10:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/MKL-numerical-stability-and-threading/m-p/877862#M9176</guid>
      <dc:creator>millidred</dc:creator>
      <dc:date>2009-11-26T14:10:06Z</dc:date>
    </item>
  </channel>
</rss>

