You don't have to declare the number of participating threads in advance, if that is what you mean, but each thread that involves the scheduler must have a task_scheduler_init, just to register itself.
(Added) For clarity, you can have any number of non-TBB threads (no task_scheduler_init required), any number of TBB threads (task_scheduler_init required to call, e.g., parallel_for), and then there are some worker threads that TBB creates for you to exploit any available parallelism (the parallelism that TBB thinks there is, unless you told it otherwise through the first task_scheduler_init). Only the first task_scheduler_init creates a scheduler, the others just register the thread (if it is not already registered). So far, you don't need to register a thread to be able to use atomics and things like concurrent_hash_map, but that's your responsibility (who knows, maybe sometime some data type will do some background maintenance).
Have you measured the performance impact of creating a task_scheduler_init object on the stack just before calling parallel_for()? You might be pleasantly surprised. Then tell us. :-)
(Added)I'm assuming that there is also along-livedinitialtask_scheduler_init instance somewhere else.
Well, here's a quote from the TBB reference manual:
A task_scheduler_init is either "active" or "inactive". Each thread that uses a task should have one active task_scheduler_init object that stays active over the duration that the thread uses task objects. A thread may have more than one active task_scheduler_init at any given moment.
The default constructor for a task_scheduler_init activates it, and the destructor uninitializes it. To defer initialization, pass the value task_scheduler_init::deferred to the constructor. Such a task_scheduler_init may be initialized later by calling method initialize. Destruction of an initialized task_scheduler_init implicitly deactivates it. To deactivate it earlier, call method terminate.
An optional parameter to the constructor and method initialize allow you to specify the number of threads to be used for task execution. This parameter is useful for scaling studies during development, but should not be set for production use. The Tutorial document says more about this topic.
To minimize time overhead, it is best to have a thread create a single task_scheduler_init object whose activation spans all uses of the library's task scheduler. A task_scheduler_init is not assignable or copy-constructible.
So it doesn't talk about the reference counting per se, but the last paragraphsuggests to have a task_scheduler_initobject whose lifetime spans the parallel activities for performance reasons.