Community
cancel
Showing results for 
Search instead for 
Did you mean: 
martinus
Beginner
37 Views

parallel calculate each value in a symmetric matrix?

Hi all, I try to parallelize the calculation of values in an symmetric matrix. The serial version looks like this:
[cpp]void calculate_upper_triangular_matrix_elements(Matrix& m) {
    for (unsigned i = 0; i < m.size1(); ++i)
        for (unsigned j = i; j < m.size2(); ++ j)
            m(i, j) = calculate(i, j);
}[/cpp]
I would like to use tbb::parallel_for, but don't know how to properly use the tbb::blocked_range for this problem. The simplest solution is to just use tbb::blocked_range2d and do nothing when i > j, but that does not seem right.
Any ideas how to do this better?
0 Kudos
4 Replies
jimdempseyatthecove
Black Belt
37 Views

When calculate(i,j) is sufficiently large, try making a synthetic iteration space.
Serial code:

[bash]xEnd = m.size1() * m.size2() / 2;
for(x=0; x < xEnd; ++x)
{
   iEnd = m.size1();
   y = x;
   for(i=0; y >= iEnd; ++i, y -= iEnd, iENd -= 1)
     continue;
   j = i + y;
   m(i, j) = calculate(i, j);
}
Then parallelize the for(x loop
The slices of the for(x loop will have ~equal number of calls to calculate
Finding the i and j indexes will add overhead, however this technique balance the load amongs the threads in your team.
Jim Dempsey





[/bash]
jimdempseyatthecove
Black Belt
37 Views

When your function calculate is very long, and when you wish to simplify your code (from my synthetic range example) you can use the parallel_for_each. This will ballance the load to some extent at the expense of additional thread scheduling (interaction) overhead.

Jim Dempsey
martinus
Beginner
37 Views

Thanks, I am now doing the same, but have replaced the inner for loop with a formula to calculate the index from x directly.


jimdempseyatthecove
Black Belt
37 Views

The formula for x is better, my code was for illustrative purposes.

an alternate way is to establish a thread team, with member numbers, then have the teams pick there member number of the mod of the number of members.

// OpenMP way
#pragma omp parallel
{
int nThreads = omp_get_num_threads();
int iThread = omp_get_thread_num();
int pick = 0;
for(int i = 0; i < n; ++i)
for(int j = i; j < n; ++j)
{
if(pick++%nThreads == iThread)
doWork(i,j);
}
}

I will let you rework that into TBB-speak using parallel_for_each or parallel_do

In QuickThread the above becomes

parallel_distribute(
[&](int iThread, int nThreads)
{
int pick = 0;
for(int i = 0; i < n; ++i)
for(int j = i; j < n; ++j)
{
if(pick++%nThreads == iThread)
doWork(i,j);
}
});

Jim Dempsey
Reply