Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Problems with the Task group interfaces

Shankar1
Beginner
677 Views

I have the following questions/comments/problems with the task group interface. Please feel free to comment on these.

1. The task_group interface copy constructs the functor which is passed to the run method.
Also this is not inline with the parallel_for interface( strict_ppl::parallel_for interfaces) where the functor is held by reference( in fact a const reference, see internal::parallel_for_body class).

Is it not true if value semantics was followed, then the user has the choice of choosing the functor to be stored by value or by reference?

2. As I understand, it is not valid to call run method on a task_group instance after the wait method has been called on the instance. If a user wants this feature he needs to use structured_task_group class. Is there any another rationale as to why I should use structured_task_group?

3. This is again with the run method overload which takes task_handle as argument. User can create a task_handle object using make_task function and pass that to the run method. Here what happens is that the make_task function copyconstructs the users functor, however the task_handle object itself is held as a reference.

So this means that the following code is prone to crash

tbb::task_scheduler_init init( 1);
{
structured_task_group g;
task_handle h = make_task( MyFunctor());
g.run( h); // run stores a reference to h to execute
} // h goes out of scope and is destroyed
g.wait();

This again looks to me like a not so clean interface.

Please correct me if any of my understanding is wrong.

_______
Sankar





0 Kudos
8 Replies
RafSchietekat
Valued Contributor III
677 Views
1. tbb::parallel_for normally also makes a copy of the functor. It is only the variant for "PPL Compatibility" that uses a reference (to avoid breaking code that assumes it).

(Added)

2. Why would you assume that it is OK to call run() after wait() for structured_task_group? The stated rationale is possible optimisations in the future based on some limitations relative to task_group.

3. The interpretation seems correct (except that this code would not compile), the conclusion perhaps less so (although I don't know the reasoning behind the design). Apparently even task_group doesn't wait() anymore in its destructor, so it might be useful or at least interesting to have that explained.
0 Kudos
Alexey-Kukanov
Employee
677 Views
1. parallel_for() will not return until all the work associated with it is complete. It allows to use references to the same body object within all tasks associated with parallel_for, because the body object passed to the call is guaranteed to exist till its completion. On the contrary, task_group::run() returns as soon as the corresponding job was submitted to the task scheduler for execution, sovalidity of the function object cannot be guaranteed and it has to be copied.

2. Actually, the opposite holds: it is not valid to call run() after wait() on a structured_task_group, but it is valid to do so on a task_group. If this behavior is broken in some scenario, report a bug to us.
The primary goal of using a structured_task_group is to have slightly less overhead in case when the usage model fits the additional restrictions which structured_task_group has, comparing to task_group.

3. This is correct, and the reason for this behavior in TBBis consistency with PPL which requiresthat
If a task_handle object is passed as a parameter to run, the caller is responsible for managing the lifetime of the task_handle object.
See also Remarks on the MSDN page for task_handle. I would assume that for structured_task_group this behavior was chosen for performance, and for task_group it was chosen for consistency with structured_task_group. But, this is better to ask at Microsoft's forums.
0 Kudos
RafSchietekat
Valued Contributor III
677 Views
"It allows to use references to the same body object within all tasks associated with parallel_for, because the body object passed to the call is guaranteed to exist till its completion."
But only the "PPL Compatibility" variant, to be sure. Alexey, would you have any comment on which you think is "better" (value or reference), and why? A reference can always be carried in a value, of course, so a value would not be worse if any inconvenience is disregarded. But wouldn't the compiler be able to do more with a value whose... value it knows to remain constant (for lack of intractable aliases), or am I missing something here?
0 Kudos
Shankar1
Beginner
677 Views

1. Thanks for the explanation. It makes sense to use references for the parallel_for's body as you have reasoned out. But my scenario where I know the object passed to task_group is guaranteed to exist I have no other way than letting task_group copy it.

So the question is understand the user functors are taken by reference in all the constructors restricting the user to use references as templates. To explain it in a properly, let me take the function_task class in task_group.h.

template
class function_task : public task {
F my_func;
/*override*/ task* execute() {
my_func();
return NULL;
}
public:
function_task( const F& f ) : my_func(f) {}
};


Because the constructor takes the users functor by reference. Now I cannot create function_task as follows

MyHandle h();
function_task f( h); // This will say reference to reference is illegal.


Instead if it was implemented as follows( Value semantics),

template
class function_task : public task {
F my_func;
/*override*/ task* execute() {
my_func();
return NULL;
}
public:
function_task( const F f ) : my_func(f) {}
};


Now I can say

function_task f( h); // I know the Taskgroup will copy my functor
function_task f( h); // I know the Taskgroup wont copy my functor


I think this means the same thing to what Raf said "A reference can always be carried in a value".

2. Actually, the opposite holds: it is not valid to call run() after wait() on a structured_task_group, but it is valid to do so on a task_group.

I dont see this when I read the code. Here is task_group's wait method

task_group_status wait() {
__TBB_TRY {
my_root->wait_for_all();
} __TBB_CATCH( ... ) {
my_context.reset();
__TBB_RETHROW();
}
if ( my_context.is_group_execution_cancelled() ) {
my_context.reset();
return canceled;
}
return complete;
}

The code effectively calls wait_for_all method on the root which will wait for the reference count to become 1, and set the reference count to zero. So when the call to wait() method completes, then my_root's reference count becomes 0. So how is it valid to call run() on the task_group again.

However if I look into structured_task_group class's wait method

task_group_status wait() {
task_group_status res = task_group_base::wait();
my_root->set_ref_count(1);
return res;
}

This wait method after calling wait on the base which is the earlier implementation, sets the reference count to 1. Now this means we can further post tasks to the task_group.

So I'm still not understanding why you have mentioned that this is not the case and it is opposite.

3. If I create a task_handle object, the task_handle object will copyconstruct the functor which I pass( same problem as in point 1). I dont know why would some one want to manage the task_handle object that copy constructs my functor.

I wouldn't really want to post this to Microsoft forums. I might as well implement this my_own_taskgroup class which can do this for me( I dont think I have the influence to change signatures of standards :) ). And my questions are to see if my understanding is correct at this moment.

And like Raf has mentioned, the code wont compile( my bad, I wanted to put the task_group g; line on top of the scope, but dint, may be I was too engrossed in explaining the actual problem :)).

0 Kudos
Alexey-Kukanov
Employee
676 Views
"It allows to use references to the same body object within all tasks associated with parallel_for, because the body object passed to the call is guaranteed to exist till its completion."
But only the "PPL Compatibility" variant, to be sure. Alexey, would you have any comment on which you think is "better" (value or reference), and why? A reference can always be carried in a value, of course, so a value would not be worse if any inconvenience is disregarded. But wouldn't the compiler be able to do more with a value whose... value it knows to remain constant (for lack of intractable aliases), or am I missing something here?

Right, only PPL compatible parallel_for uses a single body shared via references while the original tbb::parallel_for creates copies.

A while back, I did an experiment. I thought that sharing a single body instance could be faster than copying. To my surprise, for a very simple body class the variant with body sharing by reference was slower. I do not remember now whether the body object captured anything; possibly the compiler was able to eliminate copying. It's also possible that when the body object is local to a task compilers can enable more optimizations, while being more conservative when the body is shared - Raf are you asking about that? Since I am not a compiler writer, I can't say for sure, but it sounds plausible to me.
Possible positive sides of sharing I see: absence of the requirement for copy construction (though it's rarely a problem), and potential performance win in caseenough external state is captured in the body to make copying an extensive operation.

0 Kudos
RafSchietekat
Valued Contributor III
676 Views
"It's also possible that when the body object is local to a task compilers can enable more optimizations, while being more conservative when the body is shared - Raf are you asking about that?"
Yes. I suppose that the relative advantages depend on copying cost and reuse opportunities, with the latter in turn depending on grainsize and partitioner. The difference may not be as spectacular as with hoisting range.end() out of a loop, but I would have been surprised if it had been insignificant. I would hope that the original idea was based on more solid ground, though...
0 Kudos
Alexey-Kukanov
Employee
677 Views
Thanks for the discussion Shankar.

1. I see the difference, however function_task<> is an implementation detail and TBB users aren't supposed to use it directly. Do you actually mean something like this?
tbb::task_group my_task_group; MyFunctor g;
my_task_group.run( g ); // copies
my_task_group.run( g ); // does not copy

Well, I think this adds more complexity to the interface than the majority of TBB and PPL users need. However if it can be done in backward compatible and pay-as-you-go manner, we can consider it.

2. See how task_group and task_group_base constructors set task_group_context::concurrent_wait trait which disables resetting the reference count to 0 (as Andrey explained in another thread you initiated). Why structured_task_group::wait() sets the counter to 1 is unclear; most likely, it just an artifact of some initial prototypes done after preliminary versions of the PPL API specification. Currently, both PPL and TBB documentation state that wait() is only allowed to be called once for a structured_task_group, and even though our implementation may allow its reuse, I would not rely on it - for sake of compatibility with PPL if nothing else.

3. As I said, I think your understanding of the behavior is correct. I can explain probably why task_handle makes a copy of the argument: basically for the same reason as run() copies its argument. The constructor of task_handle or the make_task function can have a temporary lambda as one of its arguments that is destroyed as soon as the call to the function is completed, thus task_handle has to save a copy that it will pass to subsequent call of run().
ButI do not know why would someone want to use task_handle. We discussed it with a colleague of mine, and concluded that we don't know why this class exists at all and what problems it solves that would be impossible to solve with either explicit instances of functor classes in C++03 or lambdas and auto keyword in C++11.
0 Kudos
Shankar1
Beginner
677 Views

Thanks for the reply Alexey.

1. What I meant was exactly this. I'm not sure of the complexity it adds to the interface, but to me as a User the code that I have written just tells me whether TBB library would copy my functor or not. It would really be good if this functionality is considered and added to the upcoming release. Every ones code will still work as it was. People who understand this can go the extra mile :)

And I understand function_task is an implementation detail. I took that case because if that class was modified as I had described then your code sample using task_group would just work as wanted.

2. Oh now I understand what the concurrent_wait trait does extra. Sorry I was miss guided by the general documentation of the task::wait() method( which says it will wait for ref count to be 1 and reduces it to 0).

3. I'm not using lambdas at the moment. Well then I would just ignore task_handle class for now.
0 Kudos
Reply