Software Archive
Read-only legacy content
17061 Discussions

Using cilk_for, but no speed up!

Martin_T_1
Beginner
746 Views
I want to convert about 34 GB data on RAM with unsigned int16 type to double type. Therefore I selected cilk_for to 
run for in parallel. This is my code:
#include "stdafx.h"
#include "stdafx.h"
#include <iostream>
#include <ctime>
#include <stdlib.h>
#include <cilk\cilk.h>

using namespace std;

long long itr_1 = (long long)(18e9);
unsigned __int16 * myArr_1 = new unsigned __int16[itr_1];
//unsigned __int16 * myArr_2 = new unsigned __int16[itr_1];
double * myArr_2 = new double[itr_1];

int _tmain(int argc, _TCHAR* argv[])
{

cilk_for(long long k = 0; k < itr_1; ++k)
{
myArr_1 = rand() % 1000 + 1;
}

cilk_for(long long i = 0; i < itr_1; ++i) {
myArr_2 = (double)myArr_1;
}

return 0;
}
When I use cilk_for to initialize my first array execution time decreases from 239 seconds to 36 seconds.
But When I use cilk_for to convert my first array to double and put it into my second array execution time increases from 
70s to 187s. Why doesn't cilk_for speed up the converting loop?
Enviroment: I'm using Intel C++ 2017, Microsoft Visual Studio 2013 and OS is Windows Server 2016. Intel Xeon CPU E5-2699, two nodes and 192 GB RAM for each node. My first array takes about 34 GB of RAM and the second one takes about 136 GB.
0 Kudos
4 Replies
Hansang_B_Intel
Employee
746 Views

It is possible that the number of cilk workers by default (= number of logical processors) is too large for the given work.

Can you try smaller number by setting the environment variable, CILK_NWORKERS?

It is also worth trying OpenMP loop instead since it may give a better result for regular data parallelism like this.

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
746 Views

The function rand() contains a critical section. IOW a serializing section with the overhead of managing the mutex.

If you are simply setting up test routines to determine the effectiveness of parallelization, then select statements without serializing function calls. If you really need fast random number generation, then consult the MKL Random Number Generators.

Jim Dempsey

0 Kudos
Martin_T_1
Beginner
746 Views

I changed initializing array statement to this:

cilk_for(long long k = 0; k < itr_1; ++k)
{
myArr_1 = k % 1000 + 1;
}

But there was no improvement! Even changed to:

cilk_for(long long k = 0; k < itr_1; ++k)
{
myArr_1 = 1;
}

But no improvement! I think that memory read and write speed is the issue!
I am trying to solve it. I'll inform the results.

0 Kudos
Martin_T_1
Beginner
746 Views

I tried lots of ways, but unfortunately no improvement! Finally I ran memory test by MemTest86. The results demonstrate in below picture.

https://www.dropbox.com/s/efzclilx2yoya2j/Results.png?dl=0

As you can see, L1 Cache speed is very low for such that CPU. As well about memory speed.
When I tried the simple codes presented in previous post, the write speed was about 1.2 GB/s.
I am really puzzled why I have such a low speed! Does anybody have any suggestions?

0 Kudos
Reply