Re: Per-working-thread data.

Łukasz_Krawczyk · ‎12-23-2009

Hello everyone!

I work on application which uses D3D11 parallel rendering features. Concurent rendering in this API is based on system of device contexts, which consume resource objects and generate GPU command lists. Is there any way, using tbb task scheduler to ensure, that each worker thread will have its own set of variables needed for rendering? For example on 4 core system, tbb spawns 4 worker threads, i create 4 distinct rendering contexts(device contexts with their resources) then I somehow bind those rendering contexts with worker threads, so when scene is rendered, I can be sure, that no context is used in two worker threads at once(they are thread unsafe by definition).

Thanks in advance,
ukasz Krawczyk

RafSchietekat · ‎12-23-2009

TBB has provisions for thread-local storage and observing threads' lifecycle events, so you should be able to do what you want, I guess,especially if you have no special requirements about where tasks are executed, otherwise you may additionally need to look into affinity support.

jimdempseyatthecove · ‎12-23-2009

Consider creating your context from allocated memory (use the scalable allocator as you refine the application). You can place a pointer/reference into a vector if you wish.Then structure your TBB application such that no two TBB threads work on the same context at the same time.

One TBB technique to explore is use of the parallel_pipeline where each context is a token/buffer passed through the pipeline.

a) Are your device contexts physical devices (e.g. one each for each GPU)?
b) Does your GPU(s) support multiple virtual devices (device contexts)?
c) Must a context be presented to the same GPU all the time?
d) Must the the dialog between app and GPU be made from same thread?
(note, older ATI Brook+ had problems if same app thread did not communicate with GPU)
e) Do you want your TBB threads to stall waiting for completion from GPU request or enter task stealing mode?

Jim Dempsey

Łukasz_Krawczyk · ‎12-23-2009

Quoting - jimdempseyatthecove

Consider creating your context from allocated memory (use the scalable allocator as you refine the application). You can place a pointer/reference into a vector if you wish.Then structure your TBB application such that no two TBB threads work on the same context at the same time.

One TBB technique to explore is use of the parallel_pipeline where each context is a token/buffer passed through the pipeline.

a) Are your device contexts physical devices (e.g. one each for each GPU)?
b) Does your GPU(s) support multiple virtual devices (device contexts)?
c) Must a context be presented to the same GPU all the time?
d) Must the the dialog between app and GPU be made from same thread?
(note, older ATI Brook+ had problems if same app thread did not communicate with GPU)
e) Do you want your TBB threads to stall waiting for completion from GPU request or enter task stealing mode?

Jim Dempsey

a) No, each per-worker-thread context is a deffered device - it is able to record command buffers only.
b) D3D11 allows creating multiple device contexts on downlevel(D3D10 feature level) hardware, so as far as I know yes, simple quick and dirty test code works just fine.
c) Since rendering context uses deffered(virtual) device context, it's not tied to any physical GPU, only immediate context is.
d) Scene rendering is divided to set of layers, first the geometry is drawn, then lights, and so on. Rendering jobs will be dispatched from the main thread(each piece of geometry is a job), and then after all scene elements are drawn, rendering will be finalized in the main thread(it's a GPU-bound phase). Each of the rendering jobs generate command buffer on a deffered context and then, during the final phase those command buffers are sorted and replayed on the immediate context. So actual communication with the GPU happens only in the last phase of rendering and commands are issued from the main thread.
e) If it is possible to absolutely guarantee, that two worker threads won't use the same context, then task stealing will be the best option.

I will look into parallel_pipeline documentation.

Thank you for your help.
ukasz

Łukasz_Krawczyk · ‎12-23-2009

Quoting - Raf Schietekat

TBB has provisions for thread-local storage and observing threads' lifecycle events, so you should be able to do what you want, I guess,especially if you have no special requirements about where tasks are executed, otherwise you may additionally need to look into affinity support.

Do you mean scenario, where task_scheduler_observer is used to indicate if next task is executed, and then "injecting" data to TLS?

Regards,
ukasz

RafSchietekat · ‎12-24-2009

"Do you mean scenario, where task_scheduler_observer is used to indicate if next task is executed, and then "injecting" data to TLS?"
task_scheduler_observer is an observer for the task scheduler, and it observes threads, not task execution. Based on your initial question, these elements of TBB seemed relevant for you to evaluate. If you have done that, perhaps I could answer a more specific question, but I don't like the sound of "injecting data to TLS".

Alexey-Kukanov · ‎12-24-2009

Quoting - ukasz Krawczyk

I will look into parallel_pipeline documentation.

Thank you for your help.
ukasz

You can also look into Section 5 "Thread Local Storage" of the TBB Reference Manual at http://www.threadingbuildingblocks.org/documentation.php.

Łukasz_Krawczyk · ‎12-26-2009

Everything works flawlessly, task_scheduler_observer did the trick, thank everyone for help!

Best regards,
ukasz Krawczyk

RafSchietekat · ‎12-27-2009

"Everything works flawlessly, task_scheduler_observer did the trick, thank everyone for help!"
It would be nice to hear what trick it did...