My goal is to run each of the experiments in parallel and when the first one completes and has valid data to kill off the other ones and stop wasting processor time since we already have the answer. This is not really a parallel_for type of problem, more like a task based problem. Ideally I would like to take each experiment and in turn make them run in parallel for better performance.
Is there any good way of a task telling the scheduler to stop the other sibling tasks and any additional tasks that they generated? or am I stuck writing this myself using pthreads?
We've talked previously in this forum of using a shared atomic to abort a set of tasks executing a split loop, but I have trouble understandingthe problem statement here. On the one hand, the goal is to race a discrete set of tests in parallel; on the other, there's arequirement to abort "generated ... additional tasks." The former suggests a set of single threaded experiments, no more alternatives than there are HW threads to use as vehicles: each proceeds independently and the first one finishing aborts the others. (Sounds a lot like processes affinitized to separate processors.) The latter suggests that the experiments themselves are threaded and need to be cleared from the task tree. But tasks arescheduled unfairly in TBB--not a good environment for a race.
Suppose we encapsulatea set of experiments as sibling tasks and call the scheduler. The available HW threads will each take one of the available siblings, do whatever task splitting is available to it, then proceed with the work. Experiments entrusted to other sibling tasks not in that firstsubsetwill langish in the treeas greedy scheduling has each thread working its own subtree of tasks. Marking completion of one of the experiments in the test set could be done by the method mentioned above, but because of greedy scheduling,some experiments in the cohort may not have even been started.
Specifically I'm looking forward to the day when there are 32 cores on the box we are running on and we have three routines to try on a first pass. We can control TBB to the point of limiting each routine to just use 8 cores apiece resulting in just 27 cores in use. The question that I'm looking to solve and thus want ideas on how to try it is:
Which is better? To run three routines in linear order but let each rountine use 32 cores, or to run three routines in parallel while limiting each one to 8 cores and kill the others as soon as I have a solution? I'm thinking only experimentation will yield the answer.
I suppose an easy solution would be to use some shared variables so that when one finishes we short circuit the other tasks and have them exit. Then we can just look at the results of each one and take the one with the solution. I guess I was just looking for a more elegant solution. =P