- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I'm working on the following problem: I need to iterate over a 3D grid and create 8 std::vector<TInfo>s, where TInfo instances represent a set of operations. This is kind of a coloring algorithm where each of the 8 vectors corresponds to a color, and all the TInfo instances in the same vector can be processed concurrently without data races. This been said, I can do something like the following (not necessarelly valid c++ code):
[cpp]
std::vector<TInfo> tinfoVectors[8];
obtainTInfoVectors(grid,tinfoVectors); // This is a sequential procedure
for (int i = 0; i < 8; i++)
parallel_for(blocked_range<size_t>(0,tinfoVectors.size()), FtorProcessTInfo(tinfoVector));
[/cpp]
I want to parallelize obtainTInfoVectors. This function goes through all the grid cells and creates 8 TInfo instances per cell, and pushes the instances into the right tinfoVectors. This means that in a multi-threaded implementation, different threads will try to add elements at the same time to tinfoVectors creating data races. Two solutions come to my mind:
- Replace std::vector with tbb::concurrent_vector: the problem is that obtainTInfoVectors very intensivelly adds elements to the vectors, and I don't really expect to have much gain using tbb::concurrent_vector. Thus, I discarded this option.
- To use tbb::enumerable_thread_specific to have a tinfoVectors per thread. This, I beleive, is more efficient.
Using tbb::enumerable_thread_specific, the code can change to something like this:
[cpp]
typedef std::vector<TInfo> TInfoVectors[8];
enumerable_thread_specific<TInfoVectors> tinfoVecsTLS;
obtainTInfoVectorsParallel(grid,tinfoVecsTLS);
for (int i = 0; i < 8; i++)
for (t = tinfoVecsTLS.begin(); t != tinfoVecsTLS.end(); t++) //LOOP1
parallel_for(blocked_range<size_t>(0,(*t).size()), FtorProcessTInfo((*t)));//LOOP2
[/cpp]
However, since parallel_for has some implicit synchronization, I'm afraid that LOOP1 can introduce unnecessary overhead. If I could flatten LOOP1 and LOOP2 into one loop and perform a parallel_for would be ideal. I'm aware of flattened2d, but since it only supports forward iterator is not suited for parallel_for, and using parallel_do instead would result in unnecessary overhead also. So I'm thinking of make LOOP1 parallel as well, and get something like the following:
[cpp]
struct FtorProcessTInfoVecTLS{
typedef enumerable_thread_specific<TInfoVectors>::const_range_type range_t;
int i;
FtorProcessTInfoVecTLS(int i_p)
: i(i_p)
{}
void operator()(range_t & r) const {
// the range only has one element
parallel_for(blocked_range<size_t>(0,(*r.begin()).size()), FtorProcessTInfo((*r.begin())));//LOOP2
}
};
//...
//...
for (int i = 0; i < 8; i++)
parallel_for(tinfoVecsTLS.range(1), FtorProcessTInfoVecTLS(i),simple_partitioner());//LOOP1
[/cpp]
However, I'm not sure how nesting the parallel_for inside another parallel_for would work, nor if it is the best way to do it. Any comment or suggestions are very welcome.
Thanks in advance!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, nesting works perfectly with TBB. If you are concerned about every last bit of the performance and the work is very small, use the same task_group_context for all parallel_fors

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page