- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
#include "stdafx.h" #include "stdafx.h" #include <iostream> #include <ctime> #include <stdlib.h> #include <cilk\cilk.h> using namespace std; long long itr_1 = (long long)(18e9); unsigned __int16 * myArr_1 = new unsigned __int16[itr_1]; //unsigned __int16 * myArr_2 = new unsigned __int16[itr_1]; double * myArr_2 = new double[itr_1]; int _tmain(int argc, _TCHAR* argv[]) { cilk_for(long long k = 0; k < itr_1; ++k) { myArr_1= rand() % 1000 + 1; } cilk_for(long long i = 0; i < itr_1; ++i) { myArr_2 = (double)myArr_1; } return 0; }
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is possible that the number of cilk workers by default (= number of logical processors) is too large for the given work.
Can you try smaller number by setting the environment variable, CILK_NWORKERS?
It is also worth trying OpenMP loop instead since it may give a better result for regular data parallelism like this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The function rand() contains a critical section. IOW a serializing section with the overhead of managing the mutex.
If you are simply setting up test routines to determine the effectiveness of parallelization, then select statements without serializing function calls. If you really need fast random number generation, then consult the MKL Random Number Generators.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I changed initializing array statement to this:
cilk_for(long long k = 0; k < itr_1; ++k) { myArr_1= k % 1000 + 1; }
But there was no improvement! Even changed to:
cilk_for(long long k = 0; k < itr_1; ++k) { myArr_1= 1; }
But no improvement! I think that memory read and write speed is the issue!
I am trying to solve it. I'll inform the results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried lots of ways, but unfortunately no improvement! Finally I ran memory test by MemTest86. The results demonstrate in below picture.
https://www.dropbox.com/s/efzclilx2yoya2j/Results.png?dl=0
As you can see, L1 Cache speed is very low for such that CPU. As well about memory speed.
When I tried the simple codes presented in previous post, the write speed was about 1.2 GB/s.
I am really puzzled why I have such a low speed! Does anybody have any suggestions?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page