- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, colleagues! I'm testing TBB tasks and do not get stable results though there is nothing random in the code. [cpp] ... typedef enumerable_thread_specific< int > PerThread; ... class Logger { ... atomic counter; atomic taskCounter; PerThread locInt; ... } ... Logger::Logger () { counter = 200; taskCounter = 0; } ... task* DispTask::execute () { Logger* lg = Logger::Instance(); // lg->Message ("DispTask::execute\t\tcounter=", lg->counter); lg->counter--; int numOfChildren = lg->counter/2; // So here (somehow) we got the number of tasks to be executed (numOfChildren) if (numOfChildren > 0) { task_list tlist; for (int i=0; iMessage ("PropTask::execute"); PerThread::reference tls = lg->locInt.local(); tls++; lg->taskCounter++; // Here must be complex computations DispTask& t = *new(task::allocate_child()) DispTask(); set_ref_count(2); spawn_and_wait_for_all (t); return NULL; } ... int main (int argc, char** argv) { Logger* lg = Logger::Instance(); DispTask& a = *new (task::allocate_root()) DispTask(); task::spawn_root_and_wait (a); // Here I would like to have all the tasks finished! Kind of a barrier. int total_counter=0; for (PerThread::iterator it = lg->locInt.begin(); it != lg->locInt.end(); it++) { total_counter += *it; } lg->Message ("Sum=", total_counter); lg->Message ("Atomic counter=", lg->taskCounter); lg->Flush(); } ... [/cpp] There are 2 types of tasks: PropTask and DispTask. PropTask always leads to DispTask so in the end of PropTask::execute() I need to somehow call (spawn , return) DispTask::execute. DispTask may lead to some non-zero number of resulting objects which means that some number of PropTask need to be executed. Other situation is that DispTask results needs no further processing - terminal point in the task branch. I count the number of PropTasks executed per thread using Thread-Local-Storage. Its sum should be constant (in the test code) because there is nothing random, just dummy decrement. But for all the different variants of the "task-branching" I sometimes get different sums. 1) allocate_child + spawn_and_wait_for_all 2) using empty continuation task: allocate_continuation + cont.allocate_child() + return &t 3) using non-empty continuation task and empty child... I also use a global atomic counter for additional check. It gives the same result. It happens so that among approx. 29 correct results (sum=9900) I get once (sum=9899 or less). Where am I wrong? Also Intel Inspector detect some "Memory leaks" and "Invalid memory access". However many people say that this could be false positive. Thank you! Any ideas?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for the bad code! Try to fix it here:
[cpp]
...
typedef enumerable_thread_specific< int > PerThread;
...
class Logger {
atomic<int> counter;
atomic<int> taskCounter;
PerThread locInt;
}
...
Logger::Logger ()
{
counter = 200;
taskCounter = 0;
}
...
task* DispTask::execute () {
Logger* lg = Logger::Instance();
// lg->Message ("DispTask::execute\t\tcounter=", lg->counter);
lg->counter--;
int numOfChildren = lg->counter/2;
// So here (somehow) we got the number of tasks to be executed (numOfChildren)
if (numOfChildren > 0) {
task_list tlist;
for (int i=0; i<numOfChildren-1; i++) {
tlist.push_back ( *new(task::allocate_child()) PropTask() );
}
set_ref_count(numOfChildren+1);
PropTask& lastTask = *new(task::allocate_child()) PropTask();
spawn(tlist);
spawn_and_wait_for_all(lastTask);
}
return NULL;
}
...
task* PropTask::execute () {
Logger* lg = Logger::Instance();
// lg->Message ("PropTask::execute");
PerThread::reference tls = lg->locInt.local();
tls++;
lg->taskCounter++;
// Here must be complex computations
DispTask& t = *new(task::allocate_child()) DispTask();
set_ref_count(2);
spawn_and_wait_for_all (t);
return NULL;
}
...
int main (int argc, char** argv) {
Logger* lg = Logger::Instance();
DispTask& a = *new (task::allocate_root()) DispTask();
task::spawn_root_and_wait (a);
// Here I would like to have all the tasks finished! Kind of a barrier.
int total_counter=0;
for (PerThread::iterator it = lg->locInt.begin(); it != lg->locInt.end(); it++) {
total_counter += *it;
}
lg->Message ("Sum=", total_counter);
lg->Message ("Atomic counter=", lg->taskCounter);
lg->Flush();
}
[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok. Spent so much time alone but found my mistake. I hope.
The problem was not in task scheduling but in the number of tasks to be scheduled. Be careful with atomics!
Lines 19-20 should become
[cpp]
int numOfChildren = lg->counter.fetch_and_add(-1)/2;
[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
operator --(int) is provided.
int numOfChildren = lg->counter-- / 2;
Should be valid too.
By the way. I suggest you leave the old code in, conditionalize it out with a strong comment, then add the new statement with a strong comment. This way, the next person to handle your code will not make similar error.
Alternatively you could stongly comment the new statement such that it is clear why it is coded this way.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Shure.
Two or more threads may run DispTask at the same time.
Yes, lg->counter-- is an atomic decrement (say, op1) and
int numOfChildren = lg->counter/2; is in fact an atomic fetch (say, op2)
So these are two separate atomic operatons. Classic "ABA" situation may happen. (Or however it is called)
Thread 1 puts some number (say, 14) after op1 so 14/2 child tasks should be executed. But then another thread 2 interacts and decrements the variable to 13 by its op1. So when thread 1 fetches the value in op2 it gets not 14 but 13 and executes 13/2 child tasks instead of 14/2.
So, topic name has nothing to do with the problem. Should better call it "typical mistake using atomic variable".
However, still, I get error notifications from Inspector.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the case where lg->counter is of type atomic<int>
The post-fix operator -- is equivilent to fetch_and_add(-1)
Returns prior value of lg-counter in atomic xchgad (lock;xadd [loc],-1).
Your comments do apply to the original two statement form of your original lines 19:20
An alternate correct code would be:
[cpp]
int saveAtomicCounter = lg->counter--;
int numOfChildren = saveAtomicCounter/2;
[/cpp]
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
*** I must caution that you must assess your code to see if there is an issue in the code of counter changing Logger::Instance() by one thread while a different thread is inside DispTask::execute(). If there is an issue then you may need to protect the appropriate objects with lock(s). The use of atomic for counter would seem to indicate that you may require some protection inside DispTask::execute() in the event that critical objects are required not to change during the duration of execute().
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Back to the original problem.
As I filled my backbone prototype with computations I sometimes get crashes like this:
[plain]#3762 0x00007fdfd557317b in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7fdfcf9c3e00, parent=...,
child=<value optimized out>) at ../../src/tbb/custom_scheduler.h:448
#3763 0x00007fdfcfc6987f in CollDispTask::execute (this=0x7fdfbfd8ea40)
at src/CollDispTask.cxx:115[/plain]
Most probably there are some limitations to the size of the stack of functions. Could it be that I exceed it because of deep deep tree of tasks? So, to clarify: PropTask calls spawn_and_wait_for_all(DispTask), DispTask calls spawn_and_wait_for_all(PropTask), ....
PropTask->DispTask->PropTask->DispTask->PropTask->DispTask->PropTask->DispTask->.... Up to some moment on condition, of course.
thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That would be possible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, Raf.
Is there any way to check it? Any hints in the output? Is such a number a realistic one? #3763
Does this mean that in general there exists a limit for the size of the tree of tasks? Does not TBB control this somehow?
And of course, next question - how should I try to avoid huge trees but still wait for all tasks?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try to increase stack size and see what the new level is. Divide current stack size by the level and see if it is a realistic amount of stuff for a frame.
TBB cannot prevent things that would also let a sequential program crash. It will however refrain from stealing beyond a certain stack level (a measure that had to be introduced when going from depth-aware scheduling to deque-based scheduling).
One method to prevent excessive stack buildup is to replace blocking style (wait_for_all()) with continuation passing. See the documentation about that.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page