2465 Discussions

## parallel calculate each value in a symmetric matrix?

Beginner
293 Views
Hi all, I try to parallelize the calculation of values in an symmetric matrix. The serial version looks like this:
```[cpp]void calculate_upper_triangular_matrix_elements(Matrix& m) {
for (unsigned i = 0; i < m.size1(); ++i)
for (unsigned j = i; j < m.size2(); ++ j)
m(i, j) = calculate(i, j);
}[/cpp]```
I would like to use tbb::parallel_for, but don't know how to properly use the tbb::blocked_range for this problem. The simplest solution is to just use tbb::blocked_range2d and do nothing when i > j, but that does not seem right.
Any ideas how to do this better?
4 Replies
Honored Contributor III
293 Views
When calculate(i,j) is sufficiently large, try making a synthetic iteration space.
Serial code:

```[bash]xEnd = m.size1() * m.size2() / 2;
for(x=0; x < xEnd; ++x)
{
iEnd = m.size1();
y = x;
for(i=0; y >= iEnd; ++i, y -= iEnd, iENd -= 1)
continue;
j = i + y;
m(i, j) = calculate(i, j);
}
Then parallelize the for(x loopThe slices of the for(x loop will have ~equal number of calls to calculateFinding the i and j indexes will add overhead, however this technique balance the load amongs the threads in your team.Jim Dempsey[/bash]```
Honored Contributor III
293 Views
When your function calculate is very long, and when you wish to simplify your code (from my synthetic range example) you can use the parallel_for_each. This will ballance the load to some extent at the expense of additional thread scheduling (interaction) overhead.

Jim Dempsey
Beginner
293 Views
Thanks, I am now doing the same, but have replaced the inner for loop with a formula to calculate the index from x directly.

Honored Contributor III
293 Views
The formula for x is better, my code was for illustrative purposes.

an alternate way is to establish a thread team, with member numbers, then have the teams pick there member number of the mod of the number of members.

// OpenMP way
#pragma omp parallel
{
int pick = 0;
for(int i = 0; i < n; ++i)
for(int j = i; j < n; ++j)
{
doWork(i,j);
}
}

I will let you rework that into TBB-speak using parallel_for_each or parallel_do

parallel_distribute(