Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

Concurrent long running tasks

Hi there,
I've been working on a threaded game engine for a few weeks now using TBB and I've hit a bit of a snag.
Sorry the explanation is wrong but I want to explain my reasoning behind why I want it like I do.
Basically my entire engine is arranged as a flow_graph, with many of the nodes spawning new tasks using parallel_for to do the massively parallel operations of the particular module. Similar to how the intel multi-threaded animation example achieves it I have a generic task which knows how to run my function prototype and a task set class which spawns n generic tasks using a parallel_for.
My problem is I want to create a task/almost a thread for loading in all of the game assets. I'd planned to run a non-blocking task which would load in the files while the main game tree keeps looping and animating some sort of loading screen. Load completion would be notified using an atomic reference counter at which point the main game tree will begin runnning the full application.
However whenever I try to spawn my task I get various forms of error - some of the error messages also seem to be wrong - they reference tbb 1.0 on a z: drive which my machine doesn't have.
I've done a little more reading and found Asynchronous tasks don't seem to be supported yet.
I was wandering if there is a way to achieve this result without having to create full blown threads.
The majority of loading will take place at the start of the application with a few sporadic loads while its running as users add/remove items to the world so I feel a full blown thread is probably overkill. I'm also converned that an idling thread is wasting my system resources, preventing tbb from using and interupting my application flow every now and then to check for more data.
I see asset loading as a non constant task - something that only happens when it needs to, unlike phyics etc which would be run each frame. Loading may happen or it may not.
Putting it in the main tree would block the frame until all loads are complete which defies the point of what I want to do.
I've tried the example in the documentation - "Letting main thread work while child tasks run" but it doesn't work in my case as these child tasks are being allocated from within a task tree. My errors seem to occur as execution leaves the tasks that launched these loading tasks - provided it makes it that far - there are also sometimes exceptions thrown or occasionally it works (presumably the load task finished before it became a problem for the allocator and referece counter)
Running the task and waiting for it to complete however works without error.
Thanks very much!
0 Kudos
5 Replies
Black Belt
There's no such thing as file I/O without blocking, and thread creation overhead should be negligeable relative to those delays.

Long-running tasks that look like threads can be problematic in their own right, as they require concurrency.

Don't be afraid to use a thread for the right purpose.
New Contributor I
From what I understand, you need something like a 'switch' node - it will keep forwarding messages to a subset of its children until it receives a certain message, at which point it will start forwarding messages to a different subset of its children. In this case, the switch node will keep forwarding messages back to the nodes which update the loading screen, which will in turn loop back to the switch node. Once the asset loader node finishes, it sends a message to the switch node, which causes it to change states and forward the message to the nodes that do whatever comes after the loading screen.

This is pretty simple to hack together - use a multifunction_node with serial concurrency limiting and two output ports, one looping back to the loading screen nodes and the other forwarding to the nodes which execute after the loading screen finishes. Internal to this node is a state variable, probably just a boolean. The input message to the multifunction_node is, perhaps, an integer. When this integer is 0, it forwards the message to the loading screen nodes. When this integer is 1, it forwards to the post-loading-screen nodes and switches its state so that it drops any further messages of value 0. Since there's only one message of value 0 looping around, receiving 0 might also cause it to reverse its state again, effectively resetting the mechanism for future use. If you want to drop the requirement to be forwarding 1s and 0s around, you can stick an or_node right before the multifunction_node. The or_node will wrap messages it receives with an input port identifier, so binding the asset loader node to a different input port on the or_node than the loading screen nodes might be a good design decision.

On the other hand, it might be best to just do asset loading in a separate thread, which then sticks the data in a queue node in your graph to be loaded by your game graph. Or something. As Raf said, don't be afraid to use threads for the right purpose. TBB is a great library to be sure, but remember - when you're a hammer, everything looks like a nail. If you go this route, be sure to take a look at the graph.increment_ref_count() and graph.decrement_ref_count() methods.
Black Belt
I think the easiest way using TBB is to oversubscribe by 1 thread and enqueue your file load task.

parallel_invoke(FileLoadTask, wasYourMainCode);
void FileLoadTask()
newLoadRequest = false;
filesAreNowLoaded = true;
} // while(!Done)

Essentialy the FileLoadTask starts once, continually looking for new load files request or sleeps when no additional request. You could add a condition variable wait there or WaitForSingleEven in place of Sleep(1).

Jim Dempsey
Haha, ok by non-blocking I meant that my loading task doesn't need to do any synchronisation to get its data description to load or to then put its loaded data.
I'd set it up to be in prime position to launch and forget out loading tasks so that my main graph just had to keep an eye on that atomic flag.
I wanted to stay away from having to do synchronisation between my main graph and the loading thread whenever I wanted to launch new load operations.
Thanks very much for the suggestions! Unfortunately due to my engine structure it seems I'm not able to do any of them apart from a full blown thread.
I'm using DXUT and DirectX and my engine relies on my "main graph" returning to do the final presentation of the command buffer for the frame.
I've also implemented what I'd call a blind main loop - it has no idea what will happen apart from calling a set of graphs its been given and then calling present on DirectX. Its then up to the modules to set up their dependencies and ensure they're in the graph if they need to be so they can do any processing, so using a parallel_invoke means my main thread host will need to know about the loading task which is something I'm desperate to stay away from.
So it looks like my best and "simplest" option is some for of background thread that my assets module can look after....
Is there a feature planned to allow users to launch a task in isolation, return immediatly and forget about it. It seems it would a useful feature for things like file handling and network communication. I can see quite a few uses for a feature like this, especially in games,for instance creating a save file or some sort of dump file, firing UDP packets to a game server etc and other cases where just want something to run - that could be parallel - but doesn't neccessarily produce any sort data or input that is used by the application.
Thanks very much again for your quick responses - you've definitely got me thinking about this a bit more!!
New Contributor I
Well, the atomic counter method uses shared memory to communicate between your graph and the thread, as your graph polls the counter every iteration or whatever. Usually asynchronous message passing is better for that sort of thing, which is what TBB uses.