- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On my computer,when num_steps = 100000000,no matter the GrainSize = 50000000 or GrainSize = 10000000 or auto ,it takes 5.*s in most cases,a few cases2.*s.
When num_steps = 1000000000,it takes 50s in most cases,a few cases20s.
Environment:
AMD5000
Windows7
VS2008
[cpp]#include#include #include "tbb/parallel_reduce.h" #include "tbb/task_scheduler_init.h" #include "tbb/blocked_range.h" using namespace std; using namespace tbb; int Nthreads = 2; int GrainSize = 50000000; long long num_steps = 100000000; class CMyPi { double *const my_step; public: double sum; void operator()(const blocked_range & r); CMyPi(CMyPi& x, split); void join(const CMyPi& y); CMyPi(double *const step); }; CMyPi::CMyPi(double *const step):my_step(step) { sum = 0.0; } CMyPi::CMyPi(CMyPi &x, tbb::split):my_step(x.my_step) { sum = 0.0; } void CMyPi::join(const CMyPi &y) { sum += y.sum; } // step = 1.0/(double)num_steps; // for (i=0; i < num_steps; i++) // { // x = (i+0.5)*step; // sum = sum + 4.0/(1.0 + x*x); // } void CMyPi::operator ()(const blocked_range & r) { double x = 0.0; for(int i = r.begin();i!=r.end();++i) { x = (i+0.5)* (*my_step); sum+=4.0/(1.0+x*x); } } int main(int argc, char* argv[]) { clock_t start, stop; double pi; double width = 1./(double)num_steps; CMyPi step((double *const)&width); task_scheduler_init init(task_scheduler_init::deferred); start = clock(); init.initialize(Nthreads); //TBB parallel_reduce(blocked_range (0,num_steps,GrainSize), step); // parallel_reduce(blocked_range (0,num_steps), step, auto_partitioner()); pi = step.sum*width; stop = clock(); cout << "The value of PI is " << pi << endl; cout << "The time to calculate PI was " << (double)(stop-start)/CLOCKS_PER_SEC << " secondsn"; system("pause"); return 0; } //#include //static long num_steps=100000; //double step; //void main() //{ int i; // double x, pi, sum = 0.0; // step = 1.0/(double)num_steps; // for (i=0; i < num_steps; i++) // { // x = (i+0.5)*step; // sum = sum + 4.0/(1.0 + x*x); // } // pi = step * sum; // printf(Pi = %fn,pi); //} [/cpp]
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - hengyunabc
On my computer,when num_steps = 100000000,no matter the GrainSize = 50000000 or GrainSize = 10000000 or auto ,it takes 5.*s in most cases,a few cases2.*s.
[cpp] task_scheduler_init init(task_scheduler_init::deferred); start = clock(); init.initialize(Nthreads); //TBB parallel_reduce(blocked_range(0,num_steps,GrainSize), step); // parallel_reduce(blocked_range (0,num_steps), step, auto_partitioner()); pi = step.sum*width; stop = clock(); [/cpp]
Is there a reason why you start the timing BEFORE the one-time creation of the TBB thread pool and associated data structures? What happens if you move the start clock after the Nthreads initialize?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Robert Reed (Intel)
Is there a reason why you start the timing BEFORE the one-time creation of the TBB thread pool and associated data structures? What happens if you move the start clock after the Nthreads initialize?
But it seems that I have found the reason.
When thread numberis 8 ,the run time always is 2.*s.
When thread number is 4,the run time is a little longer than 8.
When thread number is 2,longest.
I don not know why.
On XP,when thread number is 2,the run time always is 5.*.
But,on windows 7,the run time sometimes is 2.*s.
It is strange.
I heard that the process scheduling policy on windows 7 is better than XP.
Maybe there are some links between them.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I haven't looked at the problem (got to run now), but I did notice that you don't give TBB the chance to detect and use the actual level of parallelism in your machine, and I didn't see you mentioning it. Are you aware that using too many threads can decrease performance? Have you seen non-optimal behaviour if you don't provide an argument to task_scheduler_init::initialize()?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - hengyunabc
I am remiss.
But it seems that I have found the reason.
When thread numberis 8 ,the run time always is 2.*s.
When thread number is 4,the run time is a little longer than 8.
When thread number is 2,longest.
I don not know why.
On XP,when thread number is 2,the run time always is 5.*.
But,on windows 7,the run time sometimes is 2.*s.
It is strange.
I heard that the process scheduling policy on windows 7 is better than XP.
Maybe there are some links between them.
But it seems that I have found the reason.
When thread numberis 8 ,the run time always is 2.*s.
When thread number is 4,the run time is a little longer than 8.
When thread number is 2,longest.
I don not know why.
On XP,when thread number is 2,the run time always is 5.*.
But,on windows 7,the run time sometimes is 2.*s.
It is strange.
I heard that the process scheduling policy on windows 7 is better than XP.
Maybe there are some links between them.
I think maybe Raf hit upon something. Do you know how many HW threads your machine(s) has/have? Are you running XP and Windows 7 on the same machine, or the same class of machine?
I'm confused by "when thread number is 2, longest" when the two examples of XP and Windows 7 are not "longest" compared to 4 threads.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page