from this very high-level description, it seems to me that TBB would perfectly fit your problem if you know how to apply it. However there is not enough details to understand what TBB pattern is the best fit; it might be parallel_for/reduce, parallel_do, or pipeline. The fixed number of root tasks that repeatedly go to a central work store (or generator)for the next piece of work is the last thing I would consider, because in fact it means using threads, not tasks.
Tasks are lightweight objects, andshould be created on demand, not inadvance.TBB supports the pool of worker threads that, if free,either take another available task from its own task pool or steal it from another thread that has some work to do. With TBB, it's better to think of the whole job as a field where multiple workers can take pieces as they wish, rather than a central job store with a single door to enter.
For example, in case your work space is known apriori and could be recursively divided into smaller pieces, tbb::parallel_for would work well; it applies recursive divide-and-conquer strategy where your main thread starts with the whole workspace and recursively divides it in half, on each step allowing one of the halves to be stolen by a worker thread, which usesthe same strategy. Thus the whole workspace is efficiently decentralized, and at any time the number of tasks is O(P*log N) where P is the number of threads and N is the size of the workspace. When a portion of work is not worth further splitting, the thread that owns it starts real processing.
If the initial work can be extended during the processing, then tbb::parallel_do is the best fit. If the work comes irregularly from an external source, or if the processing consist of several stages and some require ordered execution, then I would think of applying tbb::pipeline.
In case you need more help, could you please describe your problem in more details, for better understanding of its nature, of where parallelismis, of possible dependencies, and etc.?