hidden text to trigger early load of fonts ПродукцияПродукцияПродукцияПродукция Các sản phẩmCác sản phẩmCác sản phẩmCác sản phẩm المنتجاتالمنتجاتالمنتجاتالمنتجات מוצריםמוצריםמוצריםמוצרים
Intel® oneAPI Threading Building Blocks
Ask questions and share information about adding parallelism to your applications when using this threading library.

TBB beginner question

Ernesto_Rojo_Jr
Beginner
981 Views
Ok, so I've been interested in learning TBB for a long time. I understand the basic concepts behind it, but practical application still eludes me. I'm so accustomed to serial execution, that its hard for me to think differently. haha. anyway, back to my actual question.

In a single threaded game loop you just have a while loop that updates the different components. You update input, timer, scripting, rendering, whatever. how would this get translated to TBB? I just want a very simple port. At first I thought you start a task that represents the "while(true)". This task spawns child tasks which are the different components like input and rendering. Now, lets say that's the correct way to do it: don't the subtasks get deleted once they're finished executing? Does that mean i have to create the task tree every single frame? All those allocations seem extremely expensive if done every frame.

My other thought was, "maybe i'm supposed to have a task list and use a parallel_for to run them." But then does that mean i'm supposed to give parallel_for a grain size of 1 so that it runs each component seperately?

So yea, I understand how to use each parallel algorithm. I just don't see how they apply to a simple game loop.
0 Kudos
4 Replies
Anton_Pegushin
New Contributor II
981 Views
Quoting - Ernesto Rojo Jr
Ok, so I've been interested in learning TBB for a long time. I understand the basic concepts behind it, but practical application still eludes me. I'm so accustomed to serial execution, that its hard for me to think differently. haha. anyway, back to my actual question.

In a single threaded game loop you just have a while loop that updates the different components. You update input, timer, scripting, rendering, whatever. how would this get translated to TBB? I just want a very simple port. At first I thought you start a task that represents the "while(true)". This task spawns child tasks which are the different components like input and rendering. Now, lets say that's the correct way to do it: don't the subtasks get deleted once they're finished executing? Does that mean i have to create the task tree every single frame? All those allocations seem extremely expensive if done every frame.

My other thought was, "maybe i'm supposed to have a task list and use a parallel_for to run them." But then does that mean i'm supposed to give parallel_for a grain size of 1 so that it runs each component seperately?

So yea, I understand how to use each parallel algorithm. I just don't see how they apply to a simple game loop.
Hello,

engineers here at Intel already used TBB twice to multi-thread game demos (those are just like real games, same components which interact, etc., but the code is smaller and easier to read and understand). The first one is "Destroy the Castle" (details and source code can be found herehttp://software.intel.com/en-us/articles/code-demo-destroy-the-castle/). I think it's 3.5, maybe 4 years old. Uses TBB to create top-level tasks (root-AI, root-Particles, etc.), which in turn spawn individual tasks for each particular object.
Another one is SMOKE game demo (can be obtained from this web-pagehttp://software.intel.com/en-us/articles/smoke-game-technology-demo/). It's maybe 1.5-2 years old. It uses parallel_for on several levels to spawn top-level tasks in the beginning and then later to sub-divide then, so that TBB scheduler can balance the load on all the worker threads. The easiest way to see how developers used TBB here is just to search SMOKE code for parallel_for (I think the wrapper name for it inside SMOKE is "ParallelFor").

Re: task re-allocation. TBB takes care of re-using memory whenever it is possible. So what I would suggest is you try to follow the templates that Destroy the Castle or SMOKE propose (or an alternative, which you design yourself) and when you have a prototype, you benchmark it vs the serial version and check the scalability. It might run just fine (in which case congrats!) or you can end up having issues that are caused by over-using synchronization primitives (in your collisions computations for instance) or by sticking to a not-as-efficient-in-parallel-world data structure. What I'm saying is memory (re)allocation may never become the biggest problem in a multi-threaded application, so I think you should not concentrate on that particular one during the design stage.

Also, I'd be very interested to learn about the results of your activities - how hard it really was for you to parallelize a game using TBB and how well the game ran after that? We have several customers that use TBB for their commercial games and one "Deep Shadows" even ported TBB to XBox 360 and contributed it to Intel.
0 Kudos
Ernesto_Rojo_Jr
Beginner
981 Views
Thank you for taking time to answer my question. I'm pretty new at multithreading and I thought TBB would be a good way to learn.I actually tried looking over Smoke, but most of it went way over my head. I guess sometimes the best way to learn is to jump in the deep end, right? I guess I'll just experiment with the different styles and see how things go.
0 Kudos
Anton_Pegushin
New Contributor II
981 Views
Quoting - Ernesto Rojo Jr
Thank you for taking time to answer my question. I'm pretty new at multithreading and I thought TBB would be a good way to learn.I actually tried looking over Smoke, but most of it went way over my head. I guess sometimes the best way to learn is to jump in the deep end, right? I guess I'll just experiment with the different styles and see how things go.
Yeah, I'd vote for experimenting, sure :).

SMOKE code looks more complex than it really is (the part that uses TBB I mean) because one of the specifications was to encapsulate TBB usage in one particular module (core/Framework/TaskManagerTBB I think). To make that happen it needed to be wrappedto export only some generic API that the alternative implementation (based on native threads) could support. But if you put those wrapper details aside, SMOKE is easier to understand than "Destroy the Castle" (uses Task Scheduler API, therefore harder to read/understand).

In SMOKE parallel_for is used on three different levels: to start initial tasks (large ones), then sub-divide computations within those large initial tasks and finally to parallelize processing of changes notifications at the end of each iteration. You could actually try something like this in your game - try using TBB diffferent levels, does not have to be one and it does not necessarily have to be the top one either. Good luck.
0 Kudos
Ernesto_Rojo_Jr
Beginner
982 Views
Yeah, I'd vote for experimenting, sure :).

SMOKE code looks more complex than it really is (the part that uses TBB I mean) because one of the specifications was to encapsulate TBB usage in one particular module (core/Framework/TaskManagerTBB I think). To make that happen it needed to be wrappedto export only some generic API that the alternative implementation (based on native threads) could support. But if you put those wrapper details aside, SMOKE is easier to understand than "Destroy the Castle" (uses Task Scheduler API, therefore harder to read/understand).

In SMOKE parallel_for is used on three different levels: to start initial tasks (large ones), then sub-divide computations within those large initial tasks and finally to parallelize processing of changes notifications at the end of each iteration. You could actually try something like this in your game - try using TBB diffferent levels, does not have to be one and it does not necessarily have to be the top one either. Good luck.

oh i see! So it's kind of like the second option i originally posted:
My other thought was, "maybe i'm supposed to have a task list and use a parallel_for to run them." But then does that mean i'm supposed to give parallel_for a grain size of 1 so that it runs each component seperately?
except on several levels instead of just one
0 Kudos
Reply