<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Repeatability of results from vslsConvExecX1D in Intel® oneAPI Math Kernel Library</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Repeatability-of-results-from-vslsConvExecX1D/m-p/912995#M12301</link>
    <description>&lt;P&gt;Hi Gennady,&lt;/P&gt;
&lt;P&gt;Thank you for your response, and thank you to Dmitry for his response as well.&lt;/P&gt;
&lt;P&gt;Yes, I'm well aware of the pitfalls of directly comparing floating-point values. However, in this case, it is the correct thing to do, because the whole point of this program is to verify whether results are bit-for-bit identical when the alignment of the input array is changed.&lt;/P&gt;
&lt;P&gt;When you say the "code works fine", do you mean that the program did not print any strings like "Results at offset N differ from offset 0"? What compiler did you use? Which specific MKL libraries did you link with?&lt;/P&gt;
&lt;P&gt;In my case, I'm using MS Visual Studio 2008. I see the problem when I link with:&lt;/P&gt;
&lt;P&gt;mkl_intel_c_dll.lib, mkl_intel_thread_dll.lib, mkl_core_dll.lib, libiomp5md.lib&lt;/P&gt;
&lt;P&gt;I still see the problem when I replace mkl_intel_thread_dll.lib with mkl_sequential_dll.lib.&lt;/P&gt;
&lt;P&gt;I'm using MKL 10.2.2.025 on WinXP SP3 (32-bit). CPU == Intel Core2 Duo CPU T9400 @ 2.53GHz.&lt;/P&gt;
&lt;P&gt;At runtime, if mkl_vml_p4m2.dll is NOT available (so presumably mkl_vml_def.dll is used) then the problem goes away. From this I conclude that the issues with data alignment probably come from the use of SSE or something similar.&lt;/P&gt;
&lt;P&gt;Thanks for any insight you can provide!&lt;/P&gt;
&lt;P&gt;--&lt;/P&gt;
&lt;P&gt;Eric Backus&lt;/P&gt;
&lt;P&gt;eric_backus@agilent.com&lt;/P&gt;</description>
    <pubDate>Tue, 26 Jan 2010 21:52:33 GMT</pubDate>
    <dc:creator>Eric_Backus</dc:creator>
    <dc:date>2010-01-26T21:52:33Z</dc:date>
    <item>
      <title>Repeatability of results from vslsConvExecX1D</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Repeatability-of-results-from-vslsConvExecX1D/m-p/912992#M12298</link>
      <description>Hi,&lt;BR /&gt;&lt;BR /&gt;We're using MKL 10.2.2 to speed up parts of a signal processing application that we have. One issue that we've run across is repeatability of results. We find that the results out of vslsConvExecX1D are not always the same, even when the input data is identical.&lt;BR /&gt;&lt;BR /&gt;Given identical input data, the output of vslsConvExecX1D is always close to the same,so I'd guess thatthe differences are due to different rounding/truncation error accumulation. I think at least part of the reason for these differences is differing memory alignment of the input data. But this doesn't seem to be sufficient to explain all differences - sometimes the results from different calls to vslsConvExecX1D are different even if the input vectors are identical and start at the exact same memory location.&lt;BR /&gt;&lt;BR /&gt;The fact that there are any differences at all is disconcerting. Is there anything we can do to work around this issue?&lt;BR /&gt;&lt;BR /&gt;One thing we tried is using vslsConvSetInternalPrecision(VSL_CONV_PRECISION_DOUBLE). This does make the results completely repeatable, but it the execution speed is more than an order of magnitude slower. Significantly slower than our original simple C code that does convolution completely repeatably.&lt;BR /&gt;&lt;BR /&gt;At the end of this message I've attached a test program that demonstrates the problem, in case it helps. Sorry it's not shorter. Ugh, pasting seems to haveswallowed the indentation. On my machine, output from this program looks like this:&lt;BR /&gt;&lt;BR /&gt;Run 0&lt;BR /&gt;Results at offset 6 differ from offset 0&lt;BR /&gt;Results at offset 8 differ from offset 0&lt;BR /&gt;Results at offset 10 differ from offset 0&lt;BR /&gt;Results at offset 12 differ from offset 0&lt;BR /&gt;Results at offset 14 differ from offset 0&lt;BR /&gt;Results at offset 16 differ from offset 0&lt;BR /&gt;Run 1&lt;BR /&gt;Results at offset 1 differ from offset 0&lt;BR /&gt;Results at offset 3 differ from offset 0&lt;BR /&gt;Results at offset 5 differ from offset 0&lt;BR /&gt;Results at offset 7 differ from offset 0&lt;BR /&gt;Results at offset 9 differ from offset 0&lt;BR /&gt;Results at offset 11 differ from offset 0&lt;BR /&gt;Results at offset 13 differ from offset 0&lt;BR /&gt;Results at offset 15 differ from offset 0&lt;BR /&gt;&lt;BR /&gt;Note that the first and second runs produce different results, showing that results can differ even when two calls to vslsConvExecX1Dare given identical data at identical memory addresses.&lt;BR /&gt;&lt;BR /&gt;-- &lt;BR /&gt;Eric Backus&lt;BR /&gt;eric_backus@agilent.com&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;#include &lt;CSTDIO&gt; // For std::printf, std::fprintf&lt;BR /&gt;#include &lt;CSTDLIB&gt; // For std::rand, std::srand&lt;BR /&gt;#include "mkl.h"&lt;BR /&gt;&lt;BR /&gt;static void&lt;BR /&gt;conv_mkl(float* in, unsigned long in_len,&lt;BR /&gt;float* coef, unsigned long coef_len,&lt;BR /&gt;float* out)&lt;BR /&gt;{&lt;BR /&gt;const int mode = VSL_CONV_MODE_AUTO;&lt;BR /&gt;const int xshape = static_cast&lt;INT&gt;(coef_len);&lt;BR /&gt;const int yshape = static_cast&lt;INT&gt;(in_len);&lt;BR /&gt;const int zshape = static_cast&lt;INT&gt;(in_len - coef_len + 1);&lt;BR /&gt;const int xstride = 1;&lt;BR /&gt;const int ystride = 1;&lt;BR /&gt;const int zstride = 1;&lt;BR /&gt;const float* x = coef;&lt;BR /&gt;const int start = static_cast&lt;INT&gt;(coef_len - 1);&lt;BR /&gt;const int decimation = 1;&lt;BR /&gt;&lt;BR /&gt;int status;&lt;BR /&gt;VSLConvTaskPtr task;&lt;BR /&gt;status = vslsConvNewTaskX1D(&amp;amp;task, mode, xshape, yshape, zshape, x, xstride);&lt;BR /&gt;if (status != VSL_STATUS_OK)&lt;BR /&gt;(void) std::fprintf(stderr, "vslsConvNewTaskX1D returned %d\n", status);&lt;BR /&gt;status = vslConvSetStart(task, &amp;amp;start);&lt;BR /&gt;if (status != VSL_STATUS_OK)&lt;BR /&gt;(void) std::fprintf(stderr, "vslConvSetStart returned %d\n", status);&lt;BR /&gt;status = vslConvSetDecimation(task, &amp;amp;decimation);&lt;BR /&gt;if (status != VSL_STATUS_OK)&lt;BR /&gt;(void) std::fprintf(stderr, "vslConvSetDecimation returned %d\n", status);&lt;BR /&gt;&lt;BR /&gt;status = vslsConvExecX1D(task, in, ystride, out, zstride);&lt;BR /&gt;if (status != VSL_STATUS_OK)&lt;BR /&gt;(void) std::fprintf(stderr, "vslConvExecX1D returned %d\n", status);&lt;BR /&gt;&lt;BR /&gt;// Deleting the task alleviates the problems we have seen. But in&lt;BR /&gt;// our real application, we use (and need) multiple tasks. In the&lt;BR /&gt;// real application, we delete tasks whenever we know we're done&lt;BR /&gt;// with them, but we still see the repeatability issues.&lt;BR /&gt;//vslConvDeleteTask(&amp;amp;task);&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;static void&lt;BR /&gt;buffer_init(float* out, unsigned long len)&lt;BR /&gt;{&lt;BR /&gt;for (unsigned long i = 0; i &amp;lt; len; i++)&lt;BR /&gt;*out++ = std::rand() * 0.001f;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;static void&lt;BR /&gt;buffer_copy(float* in, float* out, unsigned long len)&lt;BR /&gt;{&lt;BR /&gt;for (unsigned long i = 0; i &amp;lt; len; i++)&lt;BR /&gt;*out++ = *in++;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;// Returns true for exact equality&lt;BR /&gt;static bool&lt;BR /&gt;buffer_equal(float* in1, float* in2, unsigned long len)&lt;BR /&gt;{&lt;BR /&gt;for (unsigned long i = 0; i &amp;lt; len; i++)&lt;BR /&gt;if (*in1++ != *in2++)&lt;BR /&gt;return false;&lt;BR /&gt;return true;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;int&lt;BR /&gt;main(int argc, char **argv)&lt;BR /&gt;{&lt;BR /&gt;std::srand(12345); // Same sequence every time, for now&lt;BR /&gt;&lt;BR /&gt;// Main input data buffer&lt;BR /&gt;const unsigned long in_len = 4096;&lt;BR /&gt;float in[in_len];&lt;BR /&gt;buffer_init(in, in_len);&lt;BR /&gt;&lt;BR /&gt;// Small "coef" buffer&lt;BR /&gt;const unsigned long coef_len = 29;&lt;BR /&gt;float coef[coef_len];&lt;BR /&gt;buffer_init(coef, coef_len);&lt;BR /&gt;&lt;BR /&gt;// Temp coef buffer, which we use at different offsets to check&lt;BR /&gt;// how the convolution code reacts to different data alignments.&lt;BR /&gt;const unsigned long align_len = 17;&lt;BR /&gt;float coef_tmp[coef_len + align_len];&lt;BR /&gt;&lt;BR /&gt;// Output buffers&lt;BR /&gt;const unsigned long out_len = in_len + coef_len; // Longer than necessary&lt;BR /&gt;float ref[out_len];&lt;BR /&gt;float out[out_len];&lt;BR /&gt;buffer_init(out, out_len);&lt;BR /&gt;&lt;BR /&gt;for (int k = 0; k &amp;lt; 2; k++)&lt;BR /&gt;{&lt;BR /&gt;(void) std::printf("Run %d\n", k);&lt;BR /&gt;&lt;BR /&gt;for (unsigned long i = 0; i &amp;lt; align_len; i++)&lt;BR /&gt;{&lt;BR /&gt;// Copy coefs to next alignment&lt;BR /&gt;buffer_copy(coef, coef_tmp + i, coef_len);&lt;BR /&gt;&lt;BR /&gt;conv_mkl(in, in_len, coef_tmp + i, coef_len, out);&lt;BR /&gt;&lt;BR /&gt;if (i == 0)&lt;BR /&gt;buffer_copy(out, ref, out_len);&lt;BR /&gt;else&lt;BR /&gt;if (!buffer_equal(out, ref, out_len))&lt;BR /&gt;(void) std::printf("Results at offset %ld differ from offset 0\n", i);&lt;BR /&gt;}&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;return EXIT_SUCCESS;&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;&lt;/INT&gt;&lt;/INT&gt;&lt;/INT&gt;&lt;/INT&gt;&lt;/CSTDLIB&gt;&lt;/CSTDIO&gt;</description>
      <pubDate>Wed, 13 Jan 2010 08:17:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Repeatability-of-results-from-vslsConvExecX1D/m-p/912992#M12298</guid>
      <dc:creator>Eric_Backus</dc:creator>
      <dc:date>2010-01-13T08:17:16Z</dc:date>
    </item>
    <item>
      <title>Re: Repeatability of results from vslsConvExecX1D</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Repeatability-of-results-from-vslsConvExecX1D/m-p/912993#M12299</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
&lt;P&gt;&lt;BR /&gt;Hi Eric,&lt;BR /&gt;&lt;BR /&gt;Probable reason for observed differences is threaded execution.&lt;BR /&gt;MKL User's Guide in section "Aligning Data for Numerical Stability" states the following:&lt;/P&gt;
&lt;P align="left"&gt;&lt;SPAN style="color: #800000;"&gt;With a given Intel MKL version, the outputs will be bit-for-bit identical provided all the following conditions are met:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;
&lt;DIV&gt;&lt;SPAN style="color: #800000;"&gt;the outputs are obtained on the same platform&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV&gt;&lt;SPAN style="color: #800000;"&gt;the inputs are bit-for-bit identical&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV&gt;&lt;SPAN style="color: #800000;"&gt;the input arrays are aligned identically at 16-byte boundaries&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;DIV&gt;&lt;SPAN style="color: #800000;"&gt;Intel MKL is run in the sequential mode&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P align="left"&gt;Though the conditions are formulated as related to LAPACK and BLAS, they aregeneral.&lt;/P&gt;
&lt;P align="left"&gt;Dima&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jan 2010 08:54:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Repeatability-of-results-from-vslsConvExecX1D/m-p/912993#M12299</guid>
      <dc:creator>Dmitry_B_Intel</dc:creator>
      <dc:date>2010-01-13T08:54:20Z</dc:date>
    </item>
    <item>
      <title>Re: Repeatability of results from vslsConvExecX1D</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Repeatability-of-results-from-vslsConvExecX1D/m-p/912994#M12300</link>
      <description>&lt;DIV style="margin:0px;"&gt;&lt;/DIV&gt;
Eric, &lt;BR /&gt;1)looking at your code, as seems to me the comparing of floating data &lt;BR /&gt;for (unsigned long i = 0; i &amp;lt; len; i++)&lt;BR /&gt;if (*in1++ != *in2++)&lt;BR /&gt;is not completely correct. Will be better to compare with some eps &lt;BR /&gt;2) btw, the code works fine on my local system: mkl10.2 update3, winXP32,&lt;BR /&gt;CPU == &lt;SPAN style="font-family: Verdana; color: #000000;"&gt;&lt;SMALL&gt;&lt;SPAN style="color: #0000a0;"&gt;Intel Core2 Duo CPU     T7300  @ 2.00GHz,&lt;BR /&gt;2 threads. &lt;BR /&gt;--Gennady&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SMALL&gt;&lt;/SPAN&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 13 Jan 2010 15:49:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Repeatability-of-results-from-vslsConvExecX1D/m-p/912994#M12300</guid>
      <dc:creator>Gennady_F_Intel</dc:creator>
      <dc:date>2010-01-13T15:49:48Z</dc:date>
    </item>
    <item>
      <title>Re: Repeatability of results from vslsConvExecX1D</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Repeatability-of-results-from-vslsConvExecX1D/m-p/912995#M12301</link>
      <description>&lt;P&gt;Hi Gennady,&lt;/P&gt;
&lt;P&gt;Thank you for your response, and thank you to Dmitry for his response as well.&lt;/P&gt;
&lt;P&gt;Yes, I'm well aware of the pitfalls of directly comparing floating-point values. However, in this case, it is the correct thing to do, because the whole point of this program is to verify whether results are bit-for-bit identical when the alignment of the input array is changed.&lt;/P&gt;
&lt;P&gt;When you say the "code works fine", do you mean that the program did not print any strings like "Results at offset N differ from offset 0"? What compiler did you use? Which specific MKL libraries did you link with?&lt;/P&gt;
&lt;P&gt;In my case, I'm using MS Visual Studio 2008. I see the problem when I link with:&lt;/P&gt;
&lt;P&gt;mkl_intel_c_dll.lib, mkl_intel_thread_dll.lib, mkl_core_dll.lib, libiomp5md.lib&lt;/P&gt;
&lt;P&gt;I still see the problem when I replace mkl_intel_thread_dll.lib with mkl_sequential_dll.lib.&lt;/P&gt;
&lt;P&gt;I'm using MKL 10.2.2.025 on WinXP SP3 (32-bit). CPU == Intel Core2 Duo CPU T9400 @ 2.53GHz.&lt;/P&gt;
&lt;P&gt;At runtime, if mkl_vml_p4m2.dll is NOT available (so presumably mkl_vml_def.dll is used) then the problem goes away. From this I conclude that the issues with data alignment probably come from the use of SSE or something similar.&lt;/P&gt;
&lt;P&gt;Thanks for any insight you can provide!&lt;/P&gt;
&lt;P&gt;--&lt;/P&gt;
&lt;P&gt;Eric Backus&lt;/P&gt;
&lt;P&gt;eric_backus@agilent.com&lt;/P&gt;</description>
      <pubDate>Tue, 26 Jan 2010 21:52:33 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Repeatability-of-results-from-vslsConvExecX1D/m-p/912995#M12301</guid>
      <dc:creator>Eric_Backus</dc:creator>
      <dc:date>2010-01-26T21:52:33Z</dc:date>
    </item>
  </channel>
</rss>

