Community
cancel
Showing results for
Did you mean:
Beginner
31 Views

## parallel_reduce to count ,the run time is not same?

On my computer,when num_steps = 100000000,no matter the GrainSize = 50000000 or GrainSize = 10000000 or auto ,it takes 5.*s in most cases,a few cases2.*s.
When num_steps = 1000000000,it takes 50s in most cases,a few cases20s.
Environment:
AMD5000
Windows7
VS2008

```[cpp]#include
#include
#include "tbb/parallel_reduce.h"
#include "tbb/blocked_range.h"

using namespace std;
using namespace tbb;

int   GrainSize = 50000000;
long long num_steps =  100000000;

class CMyPi
{
double *const my_step;
public:
double sum;
void operator()(const blocked_range& r);
CMyPi(CMyPi& x, split);
void join(const CMyPi& y);
CMyPi(double *const step);
};

CMyPi::CMyPi(double *const step):my_step(step)
{
sum = 0.0;
}
CMyPi::CMyPi(CMyPi &x, tbb::split):my_step(x.my_step)
{
sum = 0.0;
}
void CMyPi::join(const CMyPi &y)
{
sum += y.sum;
}
//   step = 1.0/(double)num_steps;
//   for (i=0; i < num_steps; i++)
//   {
//      x = (i+0.5)*step;
//      sum = sum + 4.0/(1.0 + x*x);
//    }
void CMyPi::operator ()(const blocked_range& r)
{
double x = 0.0;
for(int i = r.begin();i!=r.end();++i)
{
x = (i+0.5)* (*my_step);
sum+=4.0/(1.0+x*x);
}
}

int main(int argc, char* argv[])
{
clock_t start, stop;
double pi;
double width = 1./(double)num_steps;

CMyPi step((double *const)&width);

start = clock();
parallel_reduce(blocked_range(0,num_steps,GrainSize), step);
// parallel_reduce(blocked_range(0,num_steps), step, auto_partitioner());
pi = step.sum*width;
stop = clock();

cout << "The value of PI is " << pi << endl;
cout << "The time to calculate PI was " << (double)(stop-start)/CLOCKS_PER_SEC << " secondsn";
system("pause");
return 0;
}

//#include
//static long num_steps=100000;
//double step;
//void main()
//{  int i;
//   double x, pi, sum = 0.0;
//   step = 1.0/(double)num_steps;
//   for (i=0; i < num_steps; i++)
//   {
//      x = (i+0.5)*step;
//      sum = sum + 4.0/(1.0 + x*x);
//    }
//    pi = step * sum;
//    printf(Pi = %fn,pi);
//}
[/cpp]```
4 Replies
Valued Contributor II
31 Views
Quoting - hengyunabc

On my computer,when num_steps = 100000000,no matter the GrainSize = 50000000 or GrainSize = 10000000 or auto ,it takes 5.*s in most cases,a few cases2.*s.

```[cpp]   task_scheduler_init init(task_scheduler_init::deferred);

start = clock();
parallel_reduce(blocked_range(0,num_steps,GrainSize), step);
// parallel_reduce(blocked_range(0,num_steps), step, auto_partitioner());
pi = step.sum*width;
stop = clock();
[/cpp]```

Is there a reason why you start the timing BEFORE the one-time creation of the TBB thread pool and associated data structures? What happens if you move the start clock after the Nthreads initialize?
Beginner
31 Views

Is there a reason why you start the timing BEFORE the one-time creation of the TBB thread pool and associated data structures? What happens if you move the start clock after the Nthreads initialize?
I am remiss.
But it seems that I have found the reason.
When thread numberis 8 ,the run time always is 2.*s.
When thread number is 4,the run time is a little longer than 8.
I don not know why.
On XP,when thread number is 2,the run time always is 5.*.
But,on windows 7,the run time sometimes is 2.*s.
It is strange.
I heard that the process scheduling policy on windows 7 is better than XP.
Maybe there are some links between them.
Black Belt
31 Views
I haven't looked at the problem (got to run now), but I did notice that you don't give TBB the chance to detect and use the actual level of parallelism in your machine, and I didn't see you mentioning it. Are you aware that using too many threads can decrease performance? Have you seen non-optimal behaviour if you don't provide an argument to task_scheduler_init::initialize()?
Valued Contributor II
31 Views
Quoting - hengyunabc
I am remiss.
But it seems that I have found the reason.
When thread numberis 8 ,the run time always is 2.*s.
When thread number is 4,the run time is a little longer than 8.
I don not know why.
On XP,when thread number is 2,the run time always is 5.*.
But,on windows 7,the run time sometimes is 2.*s.
It is strange.
I heard that the process scheduling policy on windows 7 is better than XP.
Maybe there are some links between them.

I think maybe Raf hit upon something. Do you know how many HW threads your machine(s) has/have? Are you running XP and Windows 7 on the same machine, or the same class of machine?

I'm confused by "when thread number is 2, longest" when the two examples of XP and Windows 7 are not "longest" compared to 4 threads.