I understand TBB is designed to be composable threading library, which enable parallelism in nested, or recursive, algorithms.
By taking advantage of it, I would like to make tree traversal algorithm parallel for ray tracing, or intersection tests. (needed to query many objects, so queries are made parallel)
As the best knowledge to me, the easiest way to realize it is to nest tbb::parallel_for. But, I am afraid that a lot of creation of tasks, which is produced in tbb::parallel_for in recursive call, degrade the performance.
I would appreciate it if you could tell me the best practices, or sample codes, for such a manner, without performance loss.
I noticed that cutoff, which is limitation number of task, is used to avoid overhead as written in an example of quick sort in "Pro TBB" Chapter 2 PP.41.
I also consider such a limit in my tree traversal algorithm.