In my application I find that sometimes I am sometimes running two nested parallel_for loops. The inner loop produces data that some of the outer loop iterations depend on.
Currently, when this happens, I'm using this_tbb_thread::yield() in the outer iteration while it waits for more data to become available. What I would actually want to happen is that instead of calling yield, I can "assist" the inner loop (ie, allow the idle thread to take some of the range of the inner loop).
Is this possible in TBB? Perhaps in the form of an explicit steal of work, or registering a thread as a temporary additional worker thread?
I'm somewhat suspicious about what you're trying to do, because one parallel_for() body execution should not depend on the result of another (in the outer loop, I mean, the nested parallel_for() does not seem highly relevant in this story). Can you explain it more clearly?
The outer parallel_for is meant to introduce parallelism, each of the bodies that execute in this parallel_for are themselves (normal) loops. During an iteration of this loop, each body takes some input data and generates output data.
At some point, the collected output data needs to be processed and converted into a new batch of input data. However, the conversion cannot be parallellized (or at least, not all of it), so whenever one of the bodies is finished processing its input data, it attempts to take a lock and starts conversion of data collected up to that point (thus generating more work for the normal loops).
Obviously, not all bodies can take the lock, so I have them yield until data becomes available (ie, the conversion finished). However, a significant part of the conversion process is parallellizable, so I put a parallel_for in that (the inner parallel_for).
At that point, there may be one or more threads (bodies from the outer parallel_for) idling, because they don't have the lock, while they could be a worker thread of the inner parallel_for. So, instead of the yield, I'd like those threads to steal work from the inner parallel_for while they are idle.
Generally try to avoid blocking for a significant amount of time in a worker thread. I would suggest to instead use a pipeline with a parallel and a serial stage. Take a decent number of elements per unit of work, and then just let the pipeline figure it all out. Would that suit the problem?
I agree with Raf; what is described definitely looks appropriate for pipeline. The serial pipeline stage can use parallel_for inside, and it will have the desired property: if other threads cannot proceed because of the serial stage being busy, they will try helping there.
Please let us know whether it works for you, or if you need further help.