- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all, I try to parallelize the calculation of values in an symmetric matrix. The serial version looks like this:
[cpp]void calculate_upper_triangular_matrix_elements(Matrix& m) { for (unsigned i = 0; i < m.size1(); ++i) for (unsigned j = i; j < m.size2(); ++ j) m(i, j) = calculate(i, j); }[/cpp]I would like to use tbb::parallel_for, but don't know how to properly use the tbb::blocked_range for this problem. The simplest solution is to just use tbb::blocked_range2d and do nothing when i > j, but that does not seem right.
Any ideas how to do this better?
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When calculate(i,j) is sufficiently large, try making a synthetic iteration space.
Serial code:
Serial code:
[bash]xEnd = m.size1() * m.size2() / 2; for(x=0; x < xEnd; ++x) { iEnd = m.size1(); y = x; for(i=0; y >= iEnd; ++i, y -= iEnd, iENd -= 1) continue; j = i + y; m(i, j) = calculate(i, j); }
Then parallelize the for(x loop
The slices of the for(x loop will have ~equal number of calls to calculate
Finding the i and j indexes will add overhead, however this technique balance the load amongs the threads in your team.
Jim Dempsey
[/bash]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
When your function calculate is very long, and when you wish to simplify your code (from my synthetic range example) you can use the parallel_for_each. This will ballance the load to some extent at the expense of additional thread scheduling (interaction) overhead.
Jim Dempsey
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, I am now doing the same, but have replaced the inner for loop with a formula to calculate the index from x directly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The formula for x is better, my code was for illustrative purposes.
an alternate way is to establish a thread team, with member numbers, then have the teams pick there member number of the mod of the number of members.
// OpenMP way
#pragma omp parallel
{
int nThreads = omp_get_num_threads();
int iThread = omp_get_thread_num();
int pick = 0;
for(int i = 0; i < n; ++i)
for(int j = i; j < n; ++j)
{
if(pick++%nThreads == iThread)
doWork(i,j);
}
}
I will let you rework that into TBB-speak using parallel_for_each or parallel_do
In QuickThread the above becomes
parallel_distribute(
[&](int iThread, int nThreads)
{
int pick = 0;
for(int i = 0; i < n; ++i)
for(int j = i; j < n; ++j)
{
if(pick++%nThreads == iThread)
doWork(i,j);
}
});
Jim Dempsey
an alternate way is to establish a thread team, with member numbers, then have the teams pick there member number of the mod of the number of members.
// OpenMP way
#pragma omp parallel
{
int nThreads = omp_get_num_threads();
int iThread = omp_get_thread_num();
int pick = 0;
for(int i = 0; i < n; ++i)
for(int j = i; j < n; ++j)
{
if(pick++%nThreads == iThread)
doWork(i,j);
}
}
I will let you rework that into TBB-speak using parallel_for_each or parallel_do
In QuickThread the above becomes
parallel_distribute(
[&](int iThread, int nThreads)
{
int pick = 0;
for(int i = 0; i < n; ++i)
for(int j = i; j < n; ++j)
{
if(pick++%nThreads == iThread)
doWork(i,j);
}
});
Jim Dempsey

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page