- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello. I have been starded work with Intel TBB. Is it possible to set affinity in Intel TBB? I want to run one thread on one core. I write sample test code, but it's dont work correcly.
class Test { public: void operator()(const tbb::blocked_range<int>& range) const { cerr << "CPU: " << sched_getcpu() << " this: " << this << ", begin: " << range.begin() << ", end: " << range.end() << " " << endl; } }; int main() { cout << endl << endl; cout << "***** Intel TBB *****" << endl; tbb::affinity_partitioner ap; tbb::task_scheduler_init init(8); Test test; tbb::parallel_for(tbb::blocked_range<int>(0, 800, 100), test, ap); return 0; }
When i run it, i get the following result:
CPU: 16 this: 0x7f2ff6d1fd58, begin: 0, end: 100 CPU: 16 this: 0x7f2ff6d1f358, begin: 100, end: 200 CPU: 16 this: 0x7f2ff6d1f658, begin: 200, end: 300 CPU: 16 this: 0x7f2ff6d1f358, begin: 300, end: 400 CPU: 16 this: 0x7f2ff6d1f958, begin: 400, end: 500 CPU: 16 this: 0x7f2ff6d1f158, begin: 500, end: 600 CPU: 16 this: 0x7f2ff6d1f458, begin: 600, end: 700 CPU: 16 this: 0x7f2ff6d1f158, begin: 700, end: 800
Whats is the problem? Thank a lot.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Don't try to create one chunk per hardware thread (by abusing grainsize), that's the OpenMP way. TBB likes parallel slack, so what you should probably do here is not specify grainsize (use the default 1), and not specify partitioner (use the default auto_partitioner).
affinity_partitioner is relevant only if you have multiple loops over the same Range, and you want it to be divided the same way each time, with corresponding chunks executed preferentially on the same hardware thread.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok. Thanks for reply. So I understand that i should do this in this way:
tbb::parallel_for(tbb::blocked_range<int>(0, 800), test, tbbb::auto_partitioner());
What about if I would like to unroll loop? For example, I implemented the algorithm with two loops and I would like to unroll outer (urnoll(4)) loop which is parallelized. How should I do it correctly?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The default partitioner is auto_partitioner, you don't have to mention it explicitly.
Just to avoid any misunderstanding, affinity_partitioner is relevant for successive loops over the same Range (in case you can't do it in just one loop, I suppose).
That said, for nested loops you would typically make only the outer loop parallel, and you might then decide to explicitly vectorise the inner loop, or at least give the compiler every chance to optimise it as well as it can (sometimes unrolling, sometimes even auto-vectorising), by "hoisting" the Range::end() out of the loop (assign to a variable at the start of the loop and compare with that variable), although it's not clear to me on which environments this is most relevant.
I don't know what you mean by unroll(4) or why you would want to do that to the outer loop.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page