<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re:Parallel result is different with serial result in Intel® oneAPI DPC++/C++ Compiler</title>
    <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Parallel-result-is-different-with-serial-result/m-p/1289594#M1302</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for the confirmation. Intel will no longer monitor this thread. Further discussions on this thread will be considered community only.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Rahul&lt;/P&gt;&lt;BR /&gt;</description>
    <pubDate>Mon, 14 Jun 2021 10:26:51 GMT</pubDate>
    <dc:creator>RahulV_intel</dc:creator>
    <dc:date>2021-06-14T10:26:51Z</dc:date>
    <item>
      <title>Parallel result is different with serial result</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Parallel-result-is-different-with-serial-result/m-p/1288735#M1299</link>
      <description>&lt;LI-CODE lang="markup"&gt;#include &amp;lt;CL/sycl.hpp&amp;gt;
#include &amp;lt;iostream&amp;gt;
#include &amp;lt;ctime&amp;gt;

using namespace std;
using namespace cl::sycl;

constexpr long num_steps = 10000000;

double without_oneapi() {
    double step = 1.0 / (double)num_steps;
    double x = 0.0;
    double sum = 0.0;

    clock_t start = clock();
    for (int i = 0; i &amp;lt; num_steps; i++) {
        x = (i + 0.5) * step;
        sum += 4.0 / (1.0 + x * x);
    }
    clock_t end = clock();
    std::cout &amp;lt;&amp;lt; "Without oneapi cost " &amp;lt;&amp;lt; (double)(end - start) / CLOCKS_PER_SEC &amp;lt;&amp;lt; " second" &amp;lt;&amp;lt; std::endl;

    double res = step * sum;
    return res;
}

double with_oneapi_buffer() {
    queue q;
    //std::cout &amp;lt;&amp;lt; "Device: " &amp;lt;&amp;lt; q.get_device().get_info&amp;lt;info::device::name&amp;gt;() &amp;lt;&amp;lt; std::endl;

    double step = 1.0 / (double)num_steps;
    double data[2] = {step, 0.0};

    buffer buf(data, range&amp;lt;1&amp;gt;(2));

    clock_t start = clock();
    q.submit([&amp;amp;](handler&amp;amp; h) {
        accessor a(buf, h);
        h.parallel_for(range&amp;lt;1&amp;gt;(num_steps), [=](auto i) {
            double temp = ((double)i + 0.5) * a[0];
            a[1] += 4.0 / (1.0 + temp * temp);
            });
        }).wait();
    clock_t end = clock();
    std::cout &amp;lt;&amp;lt; "With oneapi buffer cost " &amp;lt;&amp;lt; (double)(end - start) / CLOCKS_PER_SEC &amp;lt;&amp;lt; " second" &amp;lt;&amp;lt; std::endl;

    double res = step * data[1];
    return res;
}

double with_oneapi_usm() {
    queue q;
    //std::cout &amp;lt;&amp;lt; "Device: " &amp;lt;&amp;lt; q.get_device().get_info&amp;lt;info::device::name&amp;gt;() &amp;lt;&amp;lt; std::endl;

    double step = 1.0 / (double)num_steps;
    double* data = malloc_shared&amp;lt;double&amp;gt;(2, q);
    data[0] = step;
    data[1] = 0.0;

    clock_t start = clock();
    q.parallel_for(range&amp;lt;1&amp;gt;(num_steps), [=](id&amp;lt;1&amp;gt; i){
        double temp = ((double)i + 0.5) * data[0];
        data[1] += 4.0 / (1.0 + temp * temp);
    }).wait();
    clock_t end = clock();
    std::cout &amp;lt;&amp;lt; "With oneapi usm cost " &amp;lt;&amp;lt; (double)(end - start) / CLOCKS_PER_SEC &amp;lt;&amp;lt; " second" &amp;lt;&amp;lt; std::endl;

    double res = step * data[1];
    free(data, q);
    return res;
}


int main() {

    clock_t start, end;
    
    double PI1 = without_oneapi();
    double PI2 = with_oneapi_buffer();
    double PI3 = with_oneapi_usm();
    
    std::cout &amp;lt;&amp;lt; "Without oneapi result:" &amp;lt;&amp;lt; PI1 &amp;lt;&amp;lt; std::endl;
    std::cout &amp;lt;&amp;lt; "With oneapi buffer result:" &amp;lt;&amp;lt; PI2 &amp;lt;&amp;lt; std::endl;
    std::cout &amp;lt;&amp;lt; "With oneapi usm result:" &amp;lt;&amp;lt; PI3 &amp;lt;&amp;lt; std::endl;

  return 0;
}&lt;/LI-CODE&gt;
&lt;P&gt;I tried this simple test and the result is weird. When&amp;nbsp;num_steps is larger than 10000. Both of the results of using usm and using buffer becomes smaller than they should be which is PI.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Why does this happened?&amp;nbsp;Did I use oneAPI in the right way?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Jun 2021 10:06:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Parallel-result-is-different-with-serial-result/m-p/1288735#M1299</guid>
      <dc:creator>Siqiao_Fu</dc:creator>
      <dc:date>2021-06-10T10:06:41Z</dc:date>
    </item>
    <item>
      <title>Re:Parallel result is different with serial result</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Parallel-result-is-different-with-serial-result/m-p/1289083#M1300</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;The source code provided by you is not thread-safe. &lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Specifically, this line of code:&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 14px; font-family: Consolas, Monaco, &amp;quot;Andale Mono&amp;quot;, &amp;quot;Ubuntu Mono&amp;quot;, monospace;"&gt;a[1] += 4.0 / (1.0 + temp * temp);&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;The DPC++ runtime launches as many threads as specified in the parallel_for region. In your case, the runtime launches 10000000 threads (as specified by the num_steps variable). Since all these threads are writing to the same memory location (a[1]), the results can go wrong.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;To avoid this, you could create an array to store all the partial results and then perform a reduction sum operation on the array to get the final result.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Alternatively, you might want to check out the Montecarlo PI approximation sample in the link below:&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/oneapi-src/oneAPI-samples/tree/master/Libraries/oneMKL/monte_carlo_pi" target="_blank"&gt;https://github.com/oneapi-src/oneAPI-samples/tree/master/Libraries/oneMKL/monte_carlo_pi&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Rahul&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Fri, 11 Jun 2021 12:15:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Parallel-result-is-different-with-serial-result/m-p/1289083#M1300</guid>
      <dc:creator>RahulV_intel</dc:creator>
      <dc:date>2021-06-11T12:15:06Z</dc:date>
    </item>
    <item>
      <title>Re: Re:Parallel result is different with serial result</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Parallel-result-is-different-with-serial-result/m-p/1289090#M1301</link>
      <description>&lt;P&gt;Really helpful! Thank you Rahul!&lt;/P&gt;</description>
      <pubDate>Fri, 11 Jun 2021 12:47:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Parallel-result-is-different-with-serial-result/m-p/1289090#M1301</guid>
      <dc:creator>Siqiao_Fu</dc:creator>
      <dc:date>2021-06-11T12:47:56Z</dc:date>
    </item>
    <item>
      <title>Re:Parallel result is different with serial result</title>
      <link>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Parallel-result-is-different-with-serial-result/m-p/1289594#M1302</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Thanks for the confirmation. Intel will no longer monitor this thread. Further discussions on this thread will be considered community only.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Rahul&lt;/P&gt;&lt;BR /&gt;</description>
      <pubDate>Mon, 14 Jun 2021 10:26:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-oneAPI-DPC-C-Compiler/Parallel-result-is-different-with-serial-result/m-p/1289594#M1302</guid>
      <dc:creator>RahulV_intel</dc:creator>
      <dc:date>2021-06-14T10:26:51Z</dc:date>
    </item>
  </channel>
</rss>

