Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.
2465 Discussions

parallel calculate each value in a symmetric matrix?

martinus
Beginner
293 Views
Hi all, I try to parallelize the calculation of values in an symmetric matrix. The serial version looks like this:
[cpp]void calculate_upper_triangular_matrix_elements(Matrix& m) {
    for (unsigned i = 0; i < m.size1(); ++i)
        for (unsigned j = i; j < m.size2(); ++ j)
            m(i, j) = calculate(i, j);
}[/cpp]
I would like to use tbb::parallel_for, but don't know how to properly use the tbb::blocked_range for this problem. The simplest solution is to just use tbb::blocked_range2d and do nothing when i > j, but that does not seem right.
Any ideas how to do this better?
0 Kudos
4 Replies
jimdempseyatthecove
Honored Contributor III
293 Views
When calculate(i,j) is sufficiently large, try making a synthetic iteration space.
Serial code:

[bash]xEnd = m.size1() * m.size2() / 2;
for(x=0; x < xEnd; ++x)
{
   iEnd = m.size1();
   y = x;
   for(i=0; y >= iEnd; ++i, y -= iEnd, iENd -= 1)
     continue;
   j = i + y;
   m(i, j) = calculate(i, j);
}
Then parallelize the for(x loop
The slices of the for(x loop will have ~equal number of calls to calculate
Finding the i and j indexes will add overhead, however this technique balance the load amongs the threads in your team.
Jim Dempsey





[/bash]
0 Kudos
jimdempseyatthecove
Honored Contributor III
293 Views
When your function calculate is very long, and when you wish to simplify your code (from my synthetic range example) you can use the parallel_for_each. This will ballance the load to some extent at the expense of additional thread scheduling (interaction) overhead.

Jim Dempsey
0 Kudos
martinus
Beginner
293 Views
Thanks, I am now doing the same, but have replaced the inner for loop with a formula to calculate the index from x directly.


0 Kudos
jimdempseyatthecove
Honored Contributor III
293 Views
The formula for x is better, my code was for illustrative purposes.

an alternate way is to establish a thread team, with member numbers, then have the teams pick there member number of the mod of the number of members.

// OpenMP way
#pragma omp parallel
{
int nThreads = omp_get_num_threads();
int iThread = omp_get_thread_num();
int pick = 0;
for(int i = 0; i < n; ++i)
for(int j = i; j < n; ++j)
{
if(pick++%nThreads == iThread)
doWork(i,j);
}
}

I will let you rework that into TBB-speak using parallel_for_each or parallel_do

In QuickThread the above becomes

parallel_distribute(
[&](int iThread, int nThreads)
{
int pick = 0;
for(int i = 0; i < n; ++i)
for(int j = i; j < n; ++j)
{
if(pick++%nThreads == iThread)
doWork(i,j);
}
});

Jim Dempsey
0 Kudos
Reply