Solved: Re: Need help understanding thread pool architecture - Page 2

Steve_Nuchia · ‎06-22-2009

I'm willing to bet this has been answered many time in many forms but I could't find anything that helped me, neither in the documentation nor by searching this forum.

The TBB docs are written from the perspective of a single-threaded program entering parallelizable sections (possibly nested) and emerging from them again. There's language about the requirement that each thread entering a TBB parallel construct initializing a task_sceduler_init object but nothing about what effect that has.

I've got a couple of situations that don't exactly fit the paradigm. Take the more general one: a library that may be called from a multithreaded program and wants to use TBB internally. We may be called from a thread with an existing task scheduler but from outside any TBB task, we may be called from inside a tbb task, and we may be called on a thread that's never heard of TBB before.

Further complicating matters, I'm working in Windows where all threads are not created equal. There's a faily hideous matrix of things thathave per-thread initialization and periodic maintenance obligations.

I know, use the source, Luke. What I'm hoping for here isn't so much an insight into the TBB mechanism as the phrase that whacks my head into alignment with the authors' heads.

Specific issues:

If two independent user threads call into a module that uses TBB internally, will the tasks created by the called entry points be sceduled against each other? If so, is there any direct way to influence how they are scheduled?

If there's any notion of worker thread initialization hooks, I didn't see it. Should there be? Is there an idiom for it?

We're considering implementing a structure where we wrap the tbb::parallel_foo templates with versions that pass their parameters from whatever user thread they were invoked on into a TBB thread pool. The task trees so created are meant to have arbitrarily overlapping lifetimes and no direct interaction with one another. What if any gotchas do I need to be looking out for.

thank you,
-swn

Alexey-Kukanov · ‎06-23-2009

Quoting - Steve Nuchia

Specific issues:
...

Some information related to your questions:

- I think I explained a few times in the forum how task_scheduler_init works, and that initializing TBB for a second time in a thread has low overhead. Thus the solution Peter suggested is what we recommend.

- in the next version of TBB, there will be support for automatic initialization. So you will not need to create task_scheduler_init on each call for sake of threads that did not yet initialize TBB explicitly. Still I would recommend to keep a global init object that covers DLL lifetime, to ensure TBB worker threads remain alive.

- if two independent user threads (we call them "masters") use TBB concurrently, they will share the TBB workers. Whatever master publishes its tasks first, will get the workers; but once a worker completed the piece of work stolen earlier, it will seek for another piece to steal, and the second master will be considered.The masters will most of the time work on their own tasks; but if the task pool becomes empty while stolen pieces of job are not yet completed, a master will also go and steal, possibly from another master. There is no direct way to influence stealing.

- for hooks, learn task_scheduler_observer.

- I am not sure what do you want to achieve with the above mentioned wrappers over TBB parallel algorithms. Could you elaborate a little?

View solution in original post

Steve_Nuchia · ‎06-25-2009

Quoting - Alexey Kukanov (Intel)

As far as I understand, the read-only sections (e.g. code)in the DLL may be loaded into real memory just once and mapped into an arbitrary number of processes. But writeable sections are mapped separatedly into each process that uses a DLL. Thus any static context in a DLL is never shared between different processes (applications) using that DLL.

In windows, the DLL has no mutable memory indepenedent of the processes in which it is loaded. There is a design pattern for setting up a shared memory segment and using it to keep some common state among all live clients of the DLL but that is rare and you have to use something outside the host process address space to establish the meeting point: typically the registry or the filesystem.

There may be some confusion over this in the literature because much of that literature addresses COM server construction and a local but out-of-process COM server looks a whole lot like the kind of DLL architecture Jim is imagining. The server is packaged in an EXE rather than a DLL so it has an independent address space and lifetime. Clients communicate via RPC but because it is local the marshalling is fast: windows messages and shared memory with no serialization. The high-level API for instantiating and using COM objects is independent of whether the server is in-process or local out-of-process, as is almost all of the server-side code. So it's easy for the causal observer to conflate the situations.