How effective is task::wait_for_all()?

rride · ‎06-03-2010

Hi,

imagine the following situation: I have a two-core machine and I ask tbb::task_scheduler_init to create two threads for me. In the main thread I'm going to control the execution of task dispatched to those worker thread i.e.wait for them usingtask::wait_for_all() and launch another group of tasks. So all tasks will be separates into series of batches with no data conflicts between tasks in one group.

So the questions are:

1) Will task::wait_for_all() be executed effectively? I mean will it cause any considerable slowdown of TBBworker threads?

2) Is my approach viable? The situation is that all tasks can be separated into sequential groups with no dependencies between tasks in one group and present dependencies between every two neighbour groups. All tasks must be executed repeatedly with possibility of adding new tasks on demand.

Any ideas and thoughts are welcome :)

ARCH_R_Intel · ‎06-03-2010

How many tasks are in a batch? If there are many tasks in a batch, you might be better off using parallel_for, because it creates a tree of tasks. Having a single wait_for_all wait on many tasks can introduce some contention problems. If there are only a few tasks in a batch, load balancing may be a problem. In general, it is better to overdecompose a problem so that the scheduler has flexibility to balance load.

If a task in one batch operates on data created by a specific task in the previous batch, consider using parallel_for with an affinity_partitioner. Though for only two cores, there is likely to be little benefit from affinity_partitioner.

rride · ‎06-03-2010

I suppose that there will be likelyfrom 4 to 10 tasks in each batch. I suppose that each task will be taking from 0.01 to 5-10 ms to execute.

Each task from a batch may modify some sort of global data. And the next batch may operate on the same data, so in order to avoid possible collisions I separated the tasks into different batches.

How does the wait_for_all() method operate internally? Does it use system synchronization primitives that put a thread to sleep or does it use some a sort of loop with thread::yield() inside()?

Also, can you recommend anything special on implementing a similar system where tasks'dependencies form not a straight line but a graph? I have seen the example in the tbb's distribution but maybe something was left behind it?

What do you think about using system semaphore objects (I'm programming under windows) instead of wait_for_all()? ifwait_for_all has a loop inside it, won't it be more effective to allow the kernel scheduler to wake up the waiting thread?