<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic I tried lots of ways, but in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Using-cilk-for-but-no-speed-up/m-p/1177490#M79546</link>
    <description>&lt;P&gt;I tried lots of ways, but unfortunately no improvement! Finally I ran memory test by MemTest86. The results demonstrate in below picture.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://www.dropbox.com/s/efzclilx2yoya2j/Results.png?dl=0" onclick="window.open(this.href, 'TheResults', 'resizable=no,status=no,location=no,toolbar=no,menubar=no,fullscreen=no,scrollbars=no,dependent=no'); return false;"&gt;https://www.dropbox.com/s/efzclilx2yoya2j/Results.png?dl=0&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;As you can see, L1 Cache speed is very low for such that CPU. As well about memory speed.&lt;BR /&gt;
	When I tried the simple codes presented in previous post, the write speed was about 1.2 GB/s.&lt;BR /&gt;
	I am really puzzled why I have such a low speed! Does anybody have any suggestions?&lt;/P&gt;</description>
    <pubDate>Wed, 20 Sep 2017 03:54:00 GMT</pubDate>
    <dc:creator>Martin_T_1</dc:creator>
    <dc:date>2017-09-20T03:54:00Z</dc:date>
    <item>
      <title>Using cilk_for, but no speed up!</title>
      <link>https://community.intel.com/t5/Software-Archive/Using-cilk-for-but-no-speed-up/m-p/1177486#M79542</link>
      <description>&lt;DIV&gt;I want to convert about 34 GB data on RAM with unsigned int16 type to double type. Therefore I selected cilk_for to&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;run for in parallel. This is my code:&lt;/DIV&gt;

&lt;DIV&gt;
	&lt;PRE class="brush:cpp;"&gt;#include "stdafx.h"
#include "stdafx.h"
#include &amp;lt;iostream&amp;gt;
#include &amp;lt;ctime&amp;gt;
#include &amp;lt;stdlib.h&amp;gt;
#include &amp;lt;cilk\cilk.h&amp;gt;

using namespace std;

long long itr_1 = (long long)(18e9);
unsigned __int16 * myArr_1 = new unsigned __int16[itr_1];
//unsigned __int16 * myArr_2 = new unsigned __int16[itr_1];
double * myArr_2 = new double[itr_1];

int _tmain(int argc, _TCHAR* argv[])
{

cilk_for(long long k = 0; k &amp;lt; itr_1; ++k)
{
myArr_1&lt;K&gt; = rand() % 1000 + 1;
}

cilk_for(long long i = 0; i &amp;lt; itr_1; ++i) {
myArr_2&lt;I&gt; = (double)myArr_1&lt;I&gt;;
}

return 0;
}&lt;/I&gt;&lt;/I&gt;&lt;/K&gt;&lt;/PRE&gt;
&lt;/DIV&gt;

&lt;DIV&gt;When I use cilk_for to initialize my first array execution time decreases from 239 seconds to 36 seconds.&lt;/DIV&gt;

&lt;DIV&gt;But When I use cilk_for to convert my first array to double and put it into my second array execution time increases from&amp;nbsp;&lt;/DIV&gt;

&lt;DIV&gt;70s to 187s. Why doesn't cilk_for speed up the converting loop?&lt;/DIV&gt;

&lt;DIV&gt;Enviroment: I'm using Intel C++ 2017, Microsoft Visual Studio 2013 and OS is Windows Server 2016.&amp;nbsp;&lt;SPAN style="font-size: 1em;"&gt;Intel Xeon CPU E5-2699, two nodes and 192 GB RAM for each node. My first array takes about 34 GB of RAM and the second one takes about 136 GB.&lt;/SPAN&gt;&lt;/DIV&gt;</description>
      <pubDate>Tue, 05 Sep 2017 14:05:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Using-cilk-for-but-no-speed-up/m-p/1177486#M79542</guid>
      <dc:creator>Martin_T_1</dc:creator>
      <dc:date>2017-09-05T14:05:09Z</dc:date>
    </item>
    <item>
      <title>It is possible that the</title>
      <link>https://community.intel.com/t5/Software-Archive/Using-cilk-for-but-no-speed-up/m-p/1177487#M79543</link>
      <description>&lt;P&gt;It is possible that the number of cilk workers by default (= number of logical processors) is too large for the given work.&lt;/P&gt;

&lt;P&gt;Can you try smaller number by setting the environment variable, CILK_NWORKERS?&lt;/P&gt;

&lt;P&gt;It is also worth trying OpenMP loop instead since it may give a better result for regular data parallelism like this.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 05 Sep 2017 22:03:19 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Using-cilk-for-but-no-speed-up/m-p/1177487#M79543</guid>
      <dc:creator>Hansang_B_Intel</dc:creator>
      <dc:date>2017-09-05T22:03:19Z</dc:date>
    </item>
    <item>
      <title>The function rand() contains</title>
      <link>https://community.intel.com/t5/Software-Archive/Using-cilk-for-but-no-speed-up/m-p/1177488#M79544</link>
      <description>&lt;P&gt;The function rand() contains a critical section. IOW a serializing section with the overhead of managing the mutex.&lt;/P&gt;

&lt;P&gt;If you are simply setting up test routines to determine the effectiveness of parallelization, then select statements without serializing function calls. If you really need fast random number generation, then consult the &lt;A href="https://software.intel.com/en-us/mkl-developer-reference-c-random-number-generators"&gt;MKL Random Number Generators&lt;/A&gt;.&lt;/P&gt;

&lt;P&gt;Jim Dempsey&lt;/P&gt;</description>
      <pubDate>Sat, 09 Sep 2017 12:51:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Using-cilk-for-but-no-speed-up/m-p/1177488#M79544</guid>
      <dc:creator>jimdempseyatthecove</dc:creator>
      <dc:date>2017-09-09T12:51:17Z</dc:date>
    </item>
    <item>
      <title>I changed initializing array</title>
      <link>https://community.intel.com/t5/Software-Archive/Using-cilk-for-but-no-speed-up/m-p/1177489#M79545</link>
      <description>&lt;P&gt;I changed initializing array statement to this:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;cilk_for(long long k = 0; k &amp;lt; itr_1; ++k)
{
myArr_1&lt;K&gt; = k % 1000 + 1;
}&lt;/K&gt;&lt;/PRE&gt;

&lt;P&gt;But there was no improvement! Even changed to:&lt;/P&gt;

&lt;PRE class="brush:cpp;"&gt;cilk_for(long long k = 0; k &amp;lt; itr_1; ++k)
{
myArr_1&lt;K&gt; = 1;
}&lt;/K&gt;&lt;/PRE&gt;

&lt;P&gt;But no improvement! I think that memory read and write speed is the issue!&lt;BR /&gt;
	I am trying to solve it. I'll inform the results.&lt;/P&gt;</description>
      <pubDate>Sun, 10 Sep 2017 04:45:34 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Using-cilk-for-but-no-speed-up/m-p/1177489#M79545</guid>
      <dc:creator>Martin_T_1</dc:creator>
      <dc:date>2017-09-10T04:45:34Z</dc:date>
    </item>
    <item>
      <title>I tried lots of ways, but</title>
      <link>https://community.intel.com/t5/Software-Archive/Using-cilk-for-but-no-speed-up/m-p/1177490#M79546</link>
      <description>&lt;P&gt;I tried lots of ways, but unfortunately no improvement! Finally I ran memory test by MemTest86. The results demonstrate in below picture.&lt;/P&gt;

&lt;P&gt;&lt;A href="https://www.dropbox.com/s/efzclilx2yoya2j/Results.png?dl=0" onclick="window.open(this.href, 'TheResults', 'resizable=no,status=no,location=no,toolbar=no,menubar=no,fullscreen=no,scrollbars=no,dependent=no'); return false;"&gt;https://www.dropbox.com/s/efzclilx2yoya2j/Results.png?dl=0&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;As you can see, L1 Cache speed is very low for such that CPU. As well about memory speed.&lt;BR /&gt;
	When I tried the simple codes presented in previous post, the write speed was about 1.2 GB/s.&lt;BR /&gt;
	I am really puzzled why I have such a low speed! Does anybody have any suggestions?&lt;/P&gt;</description>
      <pubDate>Wed, 20 Sep 2017 03:54:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Using-cilk-for-but-no-speed-up/m-p/1177490#M79546</guid>
      <dc:creator>Martin_T_1</dc:creator>
      <dc:date>2017-09-20T03:54:00Z</dc:date>
    </item>
  </channel>
</rss>

