I've managed to condense down the code into a smaller repro case which I've attached.
In summary; I have a task system which is designed to work with or without TBB. It has two types of task, scheduled, and immediate, where immediate tasks can either be executed in blocking mode (on the main thread), or async mode (where they use task::enqueue - designed for long running tasks). The Task class in the example I've given is what the immediate task would do in async mode when built against TBB.
If the library is built against TBB, then the tasks in the task system are updated using a parallel_for loop, however this seems to cause a lock up when combined with task::enqueue in a debug build. If I remove the parallel_for and replace it with a normal for loop, the problem goes away (see TaskManager::update).
Is this an issue with my code, or an issue with TBB?
I've done some more digging into this, and it seems to be related to the "Debug Information Format" setting (C++ -> General). I've switched to a different build generator which by default uses "Program Database for Edit & Continue (/ZI)" in a debug build rather than "Program Database (/Zi)" like my old build generator.
Changing the project settings to use "Program Database (/Zi)" prevents the hang from happening so I've updated the project generation scripts to use this instead.
Your seeing the hang in debug mode only just means that the corruption effects (as it often happens) depend on the memory layout of your process, which naturally depends on the compilation mode.
Is there any way to force the TBB scheduler to run so I can ensure that all enqueued tasks are completed before the manager is destroyed?
One possible solution is what we use to solve a similar problem inside the TBB scheduler. Heap-allocate the TaskManager object, and make the last task delete it. To prevent premature deletion, initialize m_noofTaskInFlight to 1, not 0. Then when the client is done with the TaskManager object, have it decrement the counter, "as if" it were another task completing (and if the count becomes zero, delete the TaskManager object).
For syntactic convenience, it may be useful to split TaskManager into two parts: a stack-allocated part that the client sees and a heap-allocated part. The destructor for the stack-allocated part can to the "as if" decrement of the counter in the heap-allocated part.
I have a thread-safe smart pointer implementation, so I put all the instances of the task manager and it's tasks into smart pointers to ensure they can't be deleted before the enqueued TBB task is done with them, however this hasn't fixed the issue and in fact seems to have made it worse.
The case I'm using this in is for my unit tests, which aren't really a typical use case for such a system as they just launch the task (which does no real work) and wait indefinitely for it to finish (which should be almost instantaneous).
Because there's no other work to do, and because these unit tests don't run over multiple frames, the unit test code just enters a tight loop that waits for the task to finish so it can mark the test as passed.
Is it possible this tight loop is starving the TBB scheduler and preventing it processing it's enqueued tasks? If so then it takes me back to my original question of asking if there's a way to force the TBB scheduler to run?
Some pseudo code for the test execution flow is shown below.
// An async task should eventually complete
// Just wait for it to do so
I'm using Windows, and the best advice I could find to yield on that was to call Sleep(0) however that didn't help. I also tried using tbb::this_tbb_thread::yield() but that also did nothing to help.
The threads window in Visual Studio just shows this forever for the TBB worker thread:
ID | Category | Name | Location | Priority | Suspended
4476 | Worker Thread | _threadstartex | 7711f8c1 | Normal | 0
I've put a breakpoint in the execute function of my specialised TBB task and it never gets hit. I can also leave the program running indefinitely and the TBB worker thead will never progress past the call to _threadstartex.
In addition, if I remove the tight loop which waits for the test to complete then everything works fine, however my unit test fails.
I actually have two sets of unit tests. I have my utility tests (which the task manager test is part of) which run in the standard UnitTest++ way where a test runs to completion. But I also have a much more specialised test runner that I use for my engine tests which allows me to test features over multiple frames using my normal engine main loop. I think updating my utility tests so that they can run over multiple frames would be the best solution here, as it would free up the main thread and allow other work to be done; it would also present a much more realistic use of the task manager.