- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello there. I'm a student and i'm trying some experiments with CilkPlus of icc 15. I'm using Ubuntu 12.04 with x64 Intel Processor.
The following code is an implementation of a radix sorting algorithm of an octree using points' morton codes. The problem is that it seems that though cilk decides not to spawn a new thread in one of the 8 recursive calls, it also skips calling the function serially. This results in producing a non-complete sorted index vector, whose size is less than the original index vector's size and thus it doesn't apply sorting to all points. This is not happening if i implement serially the bin splitting and i apply cilk_for only to the recursive calls. Can you explain to me what's happening? Is there an alternative implementation? Should i correct something?
#include <cstdlib> #include <cstdio> #include <vector> #include <cilk/cilk.h> #include <cilk/reducer_vector.h> #define MAXBINS 8 typedef std::vector<unsigned int> UIVector; typedef std::vector<unsigned long int> ULIVector; typedef cilk::reducer< cilk::op_vector<unsigned int> > UIVectorReducer; typedef cilk::reducer< cilk::op_vector<unsigned long int> > ULIVectorReducer; void truncated_radix_sort(const ULIVector& morton_codes, ULIVector* sorted_morton_codes, const UIVector& index, UIVector* permutation_vector, unsigned int *level_record, int population_threshold, int sft, int lv) { int N = index.size(); if (N <= 0) { return; } else if (N <= population_threshold || sft < 0) { // Base case. The node is a leaf level_record[0] = lv; // record the level of the node *permutation_vector = index; *sorted_morton_codes = morton_codes; return; } else { int i, j; level_record[0] = lv; /* Place point to a bin according to its morton code */ UIVectorReducer* bins_reducer = new UIVectorReducer[MAXBINS]; ULIVectorReducer* bin_codes_reducer = new ULIVectorReducer[MAXBINS]; cilk_for (j = 0; j < N; j++) { unsigned int ii = (morton_codes>>sft) & 0x07; (bins_reducer[ii])->push_back(index ); (bin_codes_reducer[ii])->push_back(morton_codes ); } UIVector* sorted_bins = new UIVector[MAXBINS]; ULIVector* sorted_codes = new ULIVector[MAXBINS]; int offsets[MAXBINS]; offsets[0] = 0; for (i = 1; i < MAXBINS; i++) { offsets = offsets[i-1] + bins_reducer[i-1].get_value().size(); } /* Call the function recursively to split the lower levels */ for (i = 0; i < MAXBINS; i++) { cilk_spawn truncated_radix_sort( bin_codes_reducer.get_value(), &sorted_codes, bins_reducer.get_value(), &sorted_bins, &level_record[offsets], population_threshold, sft-3, lv+1); } cilk_sync; /* Merge sorted vectors */ permutation_vector->reserve(N); sorted_morton_codes->reserve(N); for (i = 0; i < MAXBINS; i++) { permutation_vector->insert(permutation_vector->end(), sorted_bins.begin(), sorted_bins.end()); sorted_morton_codes->insert(sorted_morton_codes->end(), sorted_codes.begin(), sorted_codes.end()); } delete[] sorted_bins; delete[] sorted_codes; delete[] bins_reducer; delete[] bin_codes_reducer; } }
Thank you in advance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems like the following code looks like it might be extracting the value of a reducer inside a parallel region? If a steal of the continuation of the spawn happens, then it seems like the dereference of bins_code_reducer in 70 and 71 would get a value from an empty identity view instead of the merged view.
066 |
/* Call the function recursively to split the lower levels */ |
067 |
for (i = 0; i < MAXBINS; i++) |
068 |
{ |
069 |
cilk_spawn truncated_radix_sort( |
070 |
bin_codes_reducer.get_value(), &sorted_codes, |
071 |
bins_reducer.get_value(), &sorted_bins, |
072 |
&level_record[offsets], |
073 |
population_threshold, |
074 |
sft-3, lv+1); |
075 |
} |
076 |
cilk_sync; |
077 |
Cheers,
Jim
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The most likely explanation is that there is a bug in your code. Since you didn't include a complete test case, there is no way to tell.
Did you run the cilkscreen race detector on your code?
-Bradley
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cilkscreen does not find any race conditions. I am providing a characteristic dataset below. Before that sorted_morton_codes and permutation_vector are the output vectors. You should only feed a pointer of an empty vector. morton_codes and index are equal sized input vectors. level_record is a pointer to an index-sized array and it does not concern the functionality of this function actually. population_threshold is how many nodes should a leaf have and it means to stop the recursion. sft is where we should look to our morton code in order to partition locally our space and fill each bin-partition with a point. lv denotes the level of partition.
Typical and small input:
int N = 25; std::vector<unsigned long int> morton_codes; morton_codes[0] = 0; morton_codes[1] = 17; morton_codes[2] = 274; morton_codes[3] = 299; morton_codes[4] = 457; morton_codes[5] = 144; morton_codes[6] = 312; morton_codes[7] = 466; morton_codes[8] = 366; morton_codes[9] = 352; morton_codes[10] = 365; morton_codes[11] = 76; morton_codes[12] = 278; morton_codes[13] = 209; morton_codes[14] = 323; morton_codes[15] = 110; morton_codes[16] = 340; morton_codes[17] = 186; morton_codes[18] = 338; morton_codes[19] = 355; morton_codes[20] = 216; morton_codes[21] = 89; morton_codes[22] = 268; morton_codes[23] = 346; morton_codes[24] = 452; std::vector<unsigned long int> sorted_morton_codes; std::vector<unsigned int> index(25); for (int i = 0; i < 25; i++) index = i; std::vector<unsigned int> permutation_vector; unsigned int level_record[25]; int pop_thres = 2; int sft = 6; int lv = 0; // Call function truncated_radix_sort(morton_codes, &sorted_morton_codes, index, &permutation_vector, level_record, pop_thres, sft, lv);
Printing bins's sizes before and after execution reveals the problem.
#include <cstdlib> #include <cstdio> #include <vector> #include <cilk/cilk.h> #include <cilk/reducer_vector.h> #define MAXBINS 8 typedef std::vector<unsigned int> UIVector; typedef std::vector<unsigned long int> ULIVector; typedef cilk::reducer< cilk::op_vector<unsigned int> > UIVectorReducer; typedef cilk::reducer< cilk::op_vector<unsigned long int> > ULIVectorReducer; void truncated_radix_sort(const ULIVector& morton_codes, ULIVector* sorted_morton_codes, const UIVector& index, UIVector* permutation_vector, unsigned int *level_record, int population_threshold, int sft, int lv) { int N = index.size(); if (N <= 0) { return; } else if (N <= population_threshold || sft < 0) { // Base case. The node is a leaf level_record[0] = lv; // record the level of the node *permutation_vector = index; *sorted_morton_codes = morton_codes; return; } else { int i, j; level_record[0] = lv; /* Place point to a bin according to its morton code */ UIVectorReducer* bins_reducer = new UIVectorReducer[MAXBINS]; ULIVectorReducer* bin_codes_reducer = new ULIVectorReducer[MAXBINS]; cilk_for (j = 0; j < N; j++) { unsigned int ii = (morton_codes>>sft) & 0x07; (bins_reducer[ii])->push_back(index ); (bin_codes_reducer[ii])->push_back(morton_codes ); } UIVector* sorted_bins = new UIVector[MAXBINS]; ULIVector* sorted_codes = new ULIVector[MAXBINS]; int offsets[MAXBINS]; offsets[0] = 0; for (i = 1; i < MAXBINS; i++) { offsets = offsets[i-1] + bins_reducer[i-1].get_value().size(); } /* Printing bins before sorting */ for (i = 0; i < MAXBINS; i++) { printf("lv %d, bin %d size: %d\n", lv, i, bins_reducer.get_value().size()); } /* Call the function recursively to split the lower levels */ for (i = 0; i < MAXBINS; i++) { cilk_spawn truncated_radix_sort( bin_codes_reducer.get_value(), &sorted_codes, bins_reducer.get_value(), &sorted_bins, &level_record[offsets], population_threshold, sft-3, lv+1); } cilk_sync; /* Printing respective bins after sorting */ for (i = 0; i < MAXBINS; i++) { printf("lv %d, sorted %d size: %d\n", lv, i, sorted_bins.size()); } /* Merge sorted vectors */ permutation_vector->reserve(N); sorted_morton_codes->reserve(N); for (i = 0; i < MAXBINS; i++) { permutation_vector->insert(permutation_vector->end(), sorted_bins.begin(), sorted_bins.end()); sorted_morton_codes->insert(sorted_morton_codes->end(), sorted_codes.begin(), sorted_codes.end()); } delete[] sorted_bins; delete[] sorted_codes; delete[] bins_reducer; delete[] bin_codes_reducer; } }
Print output:
lv 0, bin 0 size: 2
lv 0, bin 1 size: 3
lv 0, bin 2 size: 2
lv 0, bin 3 size: 2
lv 0, bin 4 size: 5
lv 0, bin 5 size: 8
lv 0, bin 6 size: 0
lv 0, bin 7 size: 3
lv 1, bin 0 size: 0
lv 1, bin 1 size: 1
lv 1, bin 2 size: 0
lv 1, bin 3 size: 1
lv 1, bin 4 size: 0
lv 1, bin 5 size: 1
lv 1, bin 6 size: 0
lv 1, bin 7 size: 0
lv 1, sorted 0 size: 0
lv 1, sorted 1 size: 1
lv 1, sorted 2 size: 0
lv 1, sorted 3 size: 0
lv 1, sorted 4 size: 0
lv 1, sorted 5 size: 0
lv 1, sorted 6 size: 0
lv 1, sorted 7 size: 0
lv 0, sorted 0 size: 2
lv 0, sorted 1 size: 1
lv 0, sorted 2 size: 0
lv 0, sorted 3 size: 0
lv 0, sorted 4 size: 0
lv 0, sorted 5 size: 0
lv 0, sorted 6 size: 0
lv 0, sorted 7 size: 0
Even if printf cannot be called by more that one thread, (sorted) bin's sizes after recursive execution as numbers suggest shouldn't be 0. Also, i have checked the vector merging and it works fine.
- Christos
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you want help, you need to provide a complete test case, not one that I can assemble with enough ingenuity. ("You feed a pointer to an empty vector to ...." doesn't cut it.)
The instructions should be like "copy the enclosed code to foo.cc, compile it as icc foo.c -o foo. Expect output to be `3', but it's not."
-Bradley
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does your code run correctly with CILK_NWORKERS=1? If not, then that is an easier case to debug.
If the code does work with CILK_NWORKERS=1, but not more than 1 worker, my first suspect would be something to do the arrays of reducer vectors + the combination of recursion, since it looks like you could be getting sizes from intermediate views. In general, you shouldn't look up the value of a reducer until all the parallel strands you care about have synced.
Cheers,
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Jim for the suggestion. Code runs correctly for CILK_NWORKERS=1 and that's what is troubling me. I think that dynamically allocating reducers and using them like so (lines 41-48) is fine and works. I know that because each bin is filled correctly.
Also, at lines 52-64 <reducer>.get_value() is used after cilk_for, at the end of which i think an implicit taskwait resides. Thus using like so shouldn't be a problem (i think).
@Bradley, my apologies i thought that compiling this code should not be referenced as test instructions. Also, forgive my typo, corrected it.
The expected output for starters in a sorting algorithm should have equal length to the input. So when we work in parallel, and run the above snippet one should assert at first that:
assert(morton_codes.size() == sorted_morton_codes.size()); assert(index.size() == permutation_vector.size());
Ground truth for large inputs is hard to be written by hand (for which i am sorry too). Since you do want a complete test case for this function, i will soon provide you with a gtest test case that compares the output of a serialization to the output of this one.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems like the following code looks like it might be extracting the value of a reducer inside a parallel region? If a steal of the continuation of the spawn happens, then it seems like the dereference of bins_code_reducer in 70 and 71 would get a value from an empty identity view instead of the merged view.
066 |
/* Call the function recursively to split the lower levels */ |
067 |
for (i = 0; i < MAXBINS; i++) |
068 |
{ |
069 |
cilk_spawn truncated_radix_sort( |
070 |
bin_codes_reducer.get_value(), &sorted_codes, |
071 |
bins_reducer.get_value(), &sorted_bins, |
072 |
&level_record[offsets], |
073 |
population_threshold, |
074 |
sft-3, lv+1); |
075 |
} |
076 |
cilk_sync; |
077 |
Cheers,
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim thank you very much! It was just as you said! I was hoping that by dereferencing in the loop, i could pass immediately the reference to the next level of recursion. Nice!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page