- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I want to calculate PI using multi-core parallel algorithms. The following is my code. The first part is written with TBB's parallel_reduce, and the second part with OpenMP's reduction. Although both will give the correct answer 3.141516, Ifound that thereduction is much faster than the parallel_reduce. For example, on my Intel i930 PC (with 4 cores), the TBB's parallel_reduce needs 1.9 seconds, while the OpenMP's reduction requires 0.98 seconds. I can not understand this problem.Would anyone like to give some advice ?
#include
#include
#include "tbb/tbb.h"
using namespace std;
using namespace tbb;
const int num_steps = 1000000000;
const double step = 1.0/num_steps;
double pi = 0.0;
class CMyPi
{
public:
double sum;
CMyPi() : sum(0.0) {}
void operator() (const blocked_range
{
for(int i = r.begin();i!=r.end();++i)
{
double x = (i+0.5)*step;
sum += 4.0/(1.0 + x*x);
}
}
CMyPi(CMyPi& x, split) : sum(0.0) {}
void join(const CMyPi& y) { sum += y.sum; }
};
int main()
{
clock_t start, stop;
CMyPi myPi;
start = clock();
parallel_reduce(blocked_range
pi = step * myPi.sum;
stop = clock();
//cout << "The value of PI is " << pi << endl;
cout << "The time to calculate PI was " << (double)(stop-start)/CLOCKS_PER_SEC << " seconds\\n\\n";
start = clock();
double sum = 0.0;
#pragma omp parallel for reduction(+:sum)
for (int i=0; i
double x = (i+0.5)*step;
sum += 4.0/(1.0 + x*x);
}
pi = step*sum;
stop = clock();
//cout << "The value of PI is " << pi << endl;
cout << "The time to calculate PI was " << (double)(stop-start)/CLOCKS_PER_SEC << " seconds\\n";
return 0;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It should be something like that (I assume you have an idea about C++0x lambda functions):
[cpp] start = clock(); pi = parallel_reduce( blocked_range(0, num_steps), double(0), // identity element for summation [&]( blocked_range & r, double current_sum ) -> double { for (size_t i=r.begin(); i!=r.end(); ++i) { double x = (i+0.5)*step; current_sum += 4.0/(1.0 + x*x); } return current_sum; // body returns updated value of the accumulator }, []( double s1, double s2 ) { return s1+s2; // "joins" two accumulated values } ); pi *= step; stop = clock(); [/cpp]
Note a few things:
-This form of parallel_reduce returns a value;
- The second argument provides parallel_reduce with an identity element to initialize new accumulators. It also defines the type of the accumulators and returned value. So it is important to use proper type here; I remember how I typed a similar loop during a public demo, and made that mistakeof using "plain"0 for identity, which of course is an integer while I needed a double.
- The third and fourth arguments of parallel_reduce are lambda functions; but "regular" functors can also be used there.
- The main body functor still takes blocked_range, but it also takes the current value of an accumulator in the second argument. Note that it should also return a value; this valuewill be assigned to the accumulator overriding its old value. Thus it is important to add to the given value of the accumulator, and return the result.
- Conveniently, the accumulator argument is a local variable friendly to compiler optimizations; so you don't need any special variable to "help" the compiler.
- The fourth argument of parallel_reduce is the functor to combine (reduce) two accumulators; it takes their values and returns the result of reduction. It serves the samepurpose as method join() in the original form of parallel_reduce, but only does calculations; "joining" of one result into another has become an implementation detail.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
However, I really do not know how to use lambda funcitons in the parallel_reduce. Would you or anyone else like to help me on this issue ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It should be something like that (I assume you have an idea about C++0x lambda functions):
[cpp] start = clock(); pi = parallel_reduce( blocked_range(0, num_steps), double(0), // identity element for summation [&]( blocked_range & r, double current_sum ) -> double { for (size_t i=r.begin(); i!=r.end(); ++i) { double x = (i+0.5)*step; current_sum += 4.0/(1.0 + x*x); } return current_sum; // body returns updated value of the accumulator }, []( double s1, double s2 ) { return s1+s2; // "joins" two accumulated values } ); pi *= step; stop = clock(); [/cpp]
Note a few things:
-This form of parallel_reduce returns a value;
- The second argument provides parallel_reduce with an identity element to initialize new accumulators. It also defines the type of the accumulators and returned value. So it is important to use proper type here; I remember how I typed a similar loop during a public demo, and made that mistakeof using "plain"0 for identity, which of course is an integer while I needed a double.
- The third and fourth arguments of parallel_reduce are lambda functions; but "regular" functors can also be used there.
- The main body functor still takes blocked_range, but it also takes the current value of an accumulator in the second argument. Note that it should also return a value; this valuewill be assigned to the accumulator overriding its old value. Thus it is important to add to the given value of the accumulator, and return the result.
- Conveniently, the accumulator argument is a local variable friendly to compiler optimizations; so you don't need any special variable to "help" the compiler.
- The fourth argument of parallel_reduce is the functor to combine (reduce) two accumulators; it takes their values and returns the result of reduction. It serves the samepurpose as method join() in the original form of parallel_reduce, but only does calculations; "joining" of one result into another has become an implementation detail.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I know TBB has released many lambda-stylefunctions (classes), butIt seems that there arefew introductions about how to use these new lambda-style functions. Isuggest the TBB team should give more information or instructions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page