Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
For the latest information on Intel’s response to the Log4j/Log4Shell vulnerability, please see Intel-SA-00646

## Parallel_Scan taking more time than serial

Beginner
148 Views

I am executing code with the help of Parallel_Scan and through serially .With Serial its actually faster than using Parallel_Scan.

Code which i am using is:

#include <iostream>
#include <stdlib.h>
#include <time.h>
#include "tbb/blocked_range.h"
#include "tbb/parallel_scan.h"
#include "tbb/tick_count.h"
using namespace std;
using namespace tbb;

template <class T>
class Body
{
T reduced_result;
T* const y;
const T* const x;

public:

Body( T y_[], const T x_[] ) : reduced_result(0), x(x_), y(y_) {}

T get_reduced_result() const {return reduced_result;}

template<typename Tag>
void operator()( const blocked_range<int>& r, Tag )
{
T temp = reduced_result;
for( int i=r.begin(); i<r.end(); ++i )
{
temp = temp+x;
if( Tag::is_final_scan() )
{
y = temp;
//cout<<i<<","<<y<<endl;

}

}
reduced_result = temp;

}

Body( Body& b, split ) : x(b.x), y(b.y), reduced_result(0)
{
cout<< " output of split is is \t " << endl;
}

void reverse_join( Body& a )
{
reduced_result = a.reduced_result + reduced_result;
// cout<< " output of reduced_result now is " << reduced_result << endl;
}

void assign( Body& b )
{
reduced_result = b.reduced_result;
// cout<<"final value assigned"<<endl;
}
};

template<class T>
float DoParallelScan( T y[], const T x[], int n)
{
Body<int> body(y,x);
tick_count t1,t2,t3,t4;
t1=tick_count::now();
parallel_scan( blocked_range<int>(0,n), body , auto_partitioner() );
t2=tick_count::now();
cout<<"Time Taken for parallel scan is \t"<<(t2-t1).seconds()<<endl;
return body.get_reduced_result();
}

template<class T1>
float SerialScan(T1 y[], const T1 x[], int n)
{
tick_count t3,t4;

t3=tick_count::now();
T1 temp = 0;

for( int i=0; i<n; ++i )
{
temp = temp+x;
y = temp;
}
t4=tick_count::now();
cout<<"Time Taken for serial scan is \t"<<(t4-t3).seconds()<<endl;
return temp;

}

int main()
{

int y1[1000],x1[1000];

for(int i=0;i<1000;i++)
x1=i+1;

cout<<fixed;

cout<<"\n serial scan output is \t"<<SerialScan(y1,x1,1000)<<endl;

cout<<"\n parallel scan output is \t"<<DoParallelScan(y1,x1,1000)<<endl;

return 0;
}

14 Replies
Black Belt
148 Views

Try different grainsize values (blocked_range parameter, 1 by default, works with auto_partitioner as well as simple_partitioner)?

(Added) Do you see any difference if you don't evaluate blocked_range::end() inside the loop?

Beginner
148 Views

I have tried with different grain sizes , with serial it takes only 3 usec and with parallel it is taking a minimum of 703 usec.Please check whether my coding style is correct so that we can find where something is getting wrong.

Black Belt
148 Views

The main issue here is problem size: try again with a lot more data, but don't get your hopes up too far because memory bandwidth might be a bottleneck.

Beginner
148 Views

i have also increased the problem size but the results are same as before , for serial it becomes like 0.6 sec and for parallel its 4.2 sec.i am stucked , i have this type of algorithm and wants to implement parallel_scan in that but its not proving beneficial.If you have any better code where you have checked its performance it will be really very helpful.

Black Belt
148 Views

Did you remove end() and if() from the loop? Yes, that would mean two separate loops.

Beginner
148 Views

i have removed if from the loop it worked and timings reduced now to almost 2 sec .can u pls tell how to remove end because if i am not using end how to calculate inside loop .if i m not wrong r.end u are talking about.Thanks .

Black Belt
148 Views

[cpp]

for( int i=r.begin(), end = r.end(); i != end; ++i )

[/cpp]

(Corrected.)

Beginner
148 Views

when i am replacing the for used with this , at run time it throws exception and terminate.

Beginner
148 Views

it gives exception " Assertion h!=small_local_task || p.origin ==this failed on line 617 of file z:\itt\branchtbb41\tbb\1.01src\tbb\scheduler.h  "

Beginner
148 Views

And the for loop which u have given in that i will never be equal to end so it has raised exception .Can u Pls tell some alternative to this.

Black Belt
148 Views

Sorry, I was on my way out and in a hurry when I wrote that code. You should now be able to see the corrected version.

Beginner
148 Views

No , still its the same code which you have written earlier.

Black Belt
148 Views

Please check again: "for( int i=r.begin(); i<r.end(); ++i )" (your version) -> "for( int i=r.begin(), end=i<r.end(); i!=end; ++i )" (my earlier mistake) -> "for( int i=r.begin(), end=r.end(); i!=end; ++i )" (what it should be). (You can keep "<" instead of "!=" if you prefer.)

Beginner
148 Views

OOPS srry i missed .....now i have checked ...its correct and it worked also ,,,,,,,,,,,,,we have finaally acheived a speedup of 2X.Thanks :)