- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am executing code with the help of Parallel_Scan and through serially .With Serial its actually faster than using Parallel_Scan.
Code which i am using is:
#include <iostream>
#include <stdlib.h>
#include <time.h>
#include "tbb/task_scheduler_init.h"
#include "tbb/blocked_range.h"
#include "tbb/parallel_scan.h"
#include "tbb/tick_count.h"
#include "tbb/compat/thread"
using namespace std;
using namespace tbb;
template <class T>
class Body
{
T reduced_result;
T* const y;
const T* const x;
public:
Body( T y_[], const T x_[] ) : reduced_result(0), x(x_), y(y_) {}
T get_reduced_result() const {return reduced_result;}
template<typename Tag>
void operator()( const blocked_range<int>& r, Tag )
{
T temp = reduced_result;
//cout<<"id of thread is \t"<<this_thread::get_id()<<endl;
for( int i=r.begin(); i<r.end(); ++i )
{
temp = temp+x;
if( Tag::is_final_scan() )
{
y = temp;
//cout<<i<<","<<y<<endl;
}
}
reduced_result = temp;
}
Body( Body& b, split ) : x(b.x), y(b.y), reduced_result(0)
{
cout<< " output of split is is \t " << endl;
}
void reverse_join( Body& a )
{
reduced_result = a.reduced_result + reduced_result;
// cout<< " output of reduced_result now is " << reduced_result << endl;
}
void assign( Body& b )
{
reduced_result = b.reduced_result;
// cout<<"final value assigned"<<endl;
}
};
template<class T>
float DoParallelScan( T y[], const T x[], int n)
{
Body<int> body(y,x);
tick_count t1,t2,t3,t4;
t1=tick_count::now();
parallel_scan( blocked_range<int>(0,n), body , auto_partitioner() );
t2=tick_count::now();
cout<<"Time Taken for parallel scan is \t"<<(t2-t1).seconds()<<endl;
return body.get_reduced_result();
}
template<class T1>
float SerialScan(T1 y[], const T1 x[], int n)
{
tick_count t3,t4;
t3=tick_count::now();
T1 temp = 0;
for( int i=0; i<n; ++i )
{
// cout<<"id of thread is \t"<<this_thread::get_id()<<endl;
temp = temp+x;
y = temp;
}
t4=tick_count::now();
cout<<"Time Taken for serial scan is \t"<<(t4-t3).seconds()<<endl;
return temp;
}
int main()
{
task_scheduler_init init1(4);
int y1[1000],x1[1000];
for(int i=0;i<1000;i++)
x1=i+1;
cout<<fixed;
cout<<"\n serial scan output is \t"<<SerialScan(y1,x1,1000)<<endl;
cout<<"\n parallel scan output is \t"<<DoParallelScan(y1,x1,1000)<<endl;
return 0;
}
Please help to find where i am getting wrong.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try different grainsize values (blocked_range parameter, 1 by default, works with auto_partitioner as well as simple_partitioner)?
(Added) Do you see any difference if you don't evaluate blocked_range::end() inside the loop?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have tried with different grain sizes , with serial it takes only 3 usec and with parallel it is taking a minimum of 703 usec.Please check whether my coding style is correct so that we can find where something is getting wrong.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The main issue here is problem size: try again with a lot more data, but don't get your hopes up too far because memory bandwidth might be a bottleneck.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for replying.
i have also increased the problem size but the results are same as before , for serial it becomes like 0.6 sec and for parallel its 4.2 sec.i am stucked , i have this type of algorithm and wants to implement parallel_scan in that but its not proving beneficial.If you have any better code where you have checked its performance it will be really very helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you remove end() and if() from the loop? Yes, that would mean two separate loops.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
i have removed if from the loop it worked and timings reduced now to almost 2 sec .can u pls tell how to remove end because if i am not using end how to calculate inside loop .if i m not wrong r.end u are talking about.Thanks .
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp]
for( int i=r.begin(), end = r.end(); i != end; ++i )
[/cpp]
(Corrected.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
when i am replacing the for used with this , at run time it throws exception and terminate.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
it gives exception " Assertion h!=small_local_task || p.origin ==this failed on line 617 of file z:\itt\branchtbb41\tbb\1.01src\tbb\scheduler.h "
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And the for loop which u have given in that i will never be equal to end so it has raised exception .Can u Pls tell some alternative to this.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I was on my way out and in a hurry when I wrote that code. You should now be able to see the corrected version.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No , still its the same code which you have written earlier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please check again: "for( int i=r.begin(); i<r.end(); ++i )" (your version) -> "for( int i=r.begin(), end=i<r.end(); i!=end; ++i )" (my earlier mistake) -> "for( int i=r.begin(), end=r.end(); i!=end; ++i )" (what it should be). (You can keep "<" instead of "!=" if you prefer.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OOPS srry i missed .....now i have checked ...its correct and it worked also ,,,,,,,,,,,,,we have finaally acheived a speedup of 2X.Thanks :)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page