2429 Discussions

## Random results in scalar product (tbb::parallel_for)

Beginner
207 Views
Hi. I wrote some code to compute scalar product(dot product) of two vectors:
`[cpp]class Scalar {    Matrix &my_a, &my_b;    double & result;public:    void operator()( const tbb::blocked_range& r ) const {        Matrix& a = my_a;         Matrix& b = my_b;         double& c = result;        for( size_t i=r.begin(); i!=r.end(); ++i ){            c+=a.element[0]*b.element[0];	        }	        //std::cout << "c: " << c << std::endl;	result = c;    }    Scalar( double& r, Matrix& a, Matrix& b ) :        my_a(a), my_b(b), result    {}};void ParallelScalar( double& result, Matrix a, Matrix b){    parallel_for( tbb::blocked_range(0, a.rows, 16),        Scalar(result,a,b) );}int main( int argc, char* argv[] ) {    int M;    std::cout << "please enter size: ";    std::cin >> M;   //size of matrix    tbb::task_scheduler_init init;      create_and_fill_matrixes(M, M); //      tbb::tick_count t0 = tbb::tick_count::now();   		for(int i=0; i<5; ++i)	{		alfa=0;			ParallelScalar(alpha, a,c);		std::cout << "Alpha:" << alpha << std::endl;			}    tbb::tick_count t1 = tbb::tick_count::now();    std::cout << "Time: " << (t1-t0).seconds() << std::endl;	std::cout << "" << std::endl;	std::cout << "Beta:" << beta << std::endl;	    return 0;}[/cpp]`

When i run this code it sometimes return good results and sometimes bad :( Sometimes after 2 or 3 iterations alpha is computed wrong. Its random.
Those vectors are size eg. a[0].
Am i doing something wrong?
Is this can be caused because i use normal allocators (new , delete)?
I couldn't find proper use of scalable allocators :( can You post some exapmple please ?
Thank You
4 Replies
New Contributor I
207 Views
You have a race condition where the result could be updated by multiple threads. You'll want to protect access to the shared reference to alpha (result).

Instead of grabing the local copy of the global variable, modifying it, and then setting the global variable to the local variable's value you could just compute the local value and then update the global value atomically.

double c = 0;
// [compute c]

{
// update the global value
tbb::scoped_lock lock(global_mutex);
result += c;
}

You might also want to look at parallel_reduce. I like parallel_reduce for accumulating values because you don't need the lock. You modify local variables and then "join" the local value with another instance's local value. At the end you have the accumulated result. TBB takes care of the locking for you.

Beginner
207 Views
You have a race condition where the result could be updated by multiple threads. You'll want to protect access to the shared reference to alpha (result).
Ok I agree. When i run it on 1 thread it works perfect. But i tried to protect this reference with mustex and it still doesn't work. in what exact place should i use this mutex?

Instead of grabing the local copy of the global variable, modifying it, and then setting the global variable to the local variable's value you could just compute the local value and then update the global value atomically.
I did that because i need to sum up scalar product 5 times ^^ (more complicated example that i'm writing)

You might also want to look at parallel_reduce. I like parallel_reduce for accumulating values because you don't need the lock. You modify local variables and then "join" the local value with another instance's local value. At the end you have the accumulated result. TBB takes care of the locking for you.

I tried parallel_reduce earlier to compute this scalar product but it works strange ^^
I did something like that:
```[cpp]class Scalar {

Matrix &my_a, &my_b;

public:

int c;
void operator()( const tbb::blocked_range& r )  {

Matrix & a = my_a;

Matrix & b = my_b;
int tmp = c;

for( size_t i=r.begin(); i!=r.end(); ++i ){

tmp+=a.element[0]*b.elementy[0];

}

c = tmp;

}

Scalar( Matrix & a, Matrix & b ) :

my_a(a), my_b(b)

{}

Scalar(Scalar& x, tbb::split) : my_a(x.my_a), my_b(x.my_b), c(0) {}

void join(Scalar& y) { c +=y.c;}

};

int ParallelScalar( Matrix a, Matrix b){

Scalar total(a,b);

parallel_reduce( tbb::blocked_range(0, a.m, 16), total );
}

int main( int argc, char* argv[] ) {

int M;

std::cout << "please enter size: ";

std::cin >> M;

create_and_fill_matrixes(M, M);

tbb::tick_count t0 = tbb::tick_count::now();

for(int i=0; i<5; ++i)
{
alpha=0;
alpha = ParallelScalar(a,c);
std::cout << "Alpha:" << alpha << std::endl;

}
tbb::tick_count t1 = tbb::tick_count::now();

std::cout << "Time: " << (t1-t0).seconds() << std::endl;

std::cout << "" << std::endl;

return 0;

}
[/cpp]```
the result is wrong...
Could You help me?
New Contributor I
207 Views
One problem with your 2nd example is that c isn't initialized in the normal constructor. I compiled your example (with a little tweaking to fill in/remove the missing pieces). Before initializing c to 0 I got random results. Afterwards I got consistent results.

I changed the constructor as follows:

Scalar( Matrix & a, Matrix & b ) :
my_a(a), my_b(b), c(0)
{}

Also, you don't need to make a temporary variable in your operator(). You should be able to safely modify c directly.

Good luck. Happy to help if you have more specific questions.
Beginner
207 Views
One problem with your 2nd example is that c isn't initialized in the normal constructor. I compiled your example (with a little tweaking to fill in/remove the missing pieces). Before initializing c to 0 I got random results. Afterwards I got consistent results.

I changed the constructor as follows:

Scalar( Matrix & a, Matrix & b ) :
my_a(a), my_b(b), c(0)
{}

Also, you don't need to make a temporary variable in your operator(). You should be able to safely modify c directly.

Good luck. Happy to help if you have more specific questions.
thank You :) Few minutes ago found this bug in my constructor ^^
i'm overworked... sorry for bother You with that...
Thx for help :) now its perfect ^^