Re: is there a way to "break" a loop?

pippolomeo · ‎07-27-2007

I'm probably making a stupid question here, so excuse me my newbiesness..

I've read the tutorial but apparently there is no easy way to do this:

is there a way to break a parallel loop?

in normal C++ code, we can make something like:

for(...) {
if(i>j)
break; // <===
}

could this be done in TBB?

thanks in advance.

Michael_V_Intel · ‎07-27-2007

In the general case of a "break", an iteration is only executed if all of the previous iterations of the loop complete without break-ing.This is inherently sequential. But there may be some break conditions that do allow some concurrency, and in those case you may be able to cast the solution as a parallel_while, or even a parallel_for.

The parallel_while template is designed for loops where the end of the iteration space is not known in advance. However, using aparallel_whiledoes not always result in a scalable solution, so you should read the section on parallel_while in the product documentation's Tutorial before you use it.

Alexey-Kukanov · ‎07-27-2007

Shortly, there is no way to achieve the same effect of cancelling all consequent iterations and getting immediately out of the loop if itsiterations are executed in parallel by different threads. Just imagine that a thread gets the part of loop iteration space starting from an index number that in serial loop would not be executed at all because of the break. So such loop could only be correctly processed in parallel if the break condition is known apriori; but in that case it would be better to just reduce the iteration space.

Also be aware that in principle not every loop can be correctly executed in parallel. For more details, you can read a series of "What's not parallel" posts in Clay Breshears's blog. In your example, the break condition makes iteration dependence, it's obviously a dependence that prevents parallel execution. You might be able to achieve correct result with some algortihm modifications, though. If you have a specific loop example in mind, you may post it here.

robert-reed · ‎07-27-2007

If I may add one more bit of advice here...

As Michael points out, activities that might be typical in a single threaded algorithm may have to be rethought before applyingin a multi-threaded environment. Handling such indefinite loop termination conditions as "if (i < j ) break;" might be handled by coalescing it into a single loop completion expression which is placed in a parallel_while, but because of the iterative nature of parallel_while test condition, it wouldn't scale well unless more work can be foundwhile processing and included via the ::add() function.

Is there evera case where interrupting these loops makes sense? Sure. Consider implementing an escape event in an interactive application that allows the user to interrupt a long operation, "stop this NOW!" The typical way to handle this in a multi-threaded environment is to equip all long loops with an escape check. I might imagine such a loop in TBB to look something like this, using parallel_for:

for (int i = r.begin(); !UserInterrupt && i != r.end(); ++i) {

// do work

}

So here we have a loop that can handle any of the range splits supplied by a partitioner to work over a particular subrange, yet will stop in every instanceif requested to do so.

But don't I need to protect access to UserInterrupt?Well, technically, yes, but the consequences of not doing so here are minor. After all, during normal operations, all threads will only read the variable and can cache it locally. If a write of the variable should occur, it happens as a one time event propagated to all the cores through cache invalidation. And if a race means a particular core misses the new value on one iteration, it should catch it on the next with no loss of the intended function.

michaelsuess · ‎07-29-2007

I have written an article about how to achieve what you want in OpenMP a little while ago in my blog here:
Breaking Out of Loops in OpenMP

I believe the solution described there is applicable to TBB as well (although I am not entirely sure, because I have not had the time to really get into it).

robert-reed · ‎07-29-2007

Thanks, Michael. I took a look at your post about breaking out of OpenMP loops and it looks equivalent to the one I suggested above, essentially conditionalizing the loop body so that it can be selectively turned off to produce the effect of an abort. Therefore, it should work in TBB so long as you abide by certain restrictions, inherent in its use for OpenMP as well.

The extra syntactictwist possible with TBB that you cannot do in OpenMP is merging the abort test with the regular loop conditional test. Under OpenMP, in addition to the loop early termination restrictions,the canonical form of the for/do loop is very restricted in order to guarantee that the compiler can decipher and rewrite the loop to divide it among threads. With TBB, the loop in the parallel_for functor body is used as is, with no requirement totranslate into a parallel form and so no restriction on conditional complexity. As far as I can tell, there's no restriction on early termination of the loop--the thread will just arrive early at the post-loop join.

One issue with the technique Michael suggests that I noticed concerns the exception throw that occurs once the loop is terminated. It should be fine if there is only a single level of parallelism. But if we have multiple levels of parallelism in play, it would be better to hold off on the throw and justpreserve status about the abort until all the parallel regions have been unwound, using a loop termination technique similar to the example.

Kevin_Farnham · ‎07-29-2007

What if you eliminated the "break" and replaced the code with something like this:

for (...) {
   if (i<=j) {
      run_my_iterative_code();
   }
   // if (i > j) just iterate
}

In this case, there would be (approximately) no work to do for all iterations where i > j. So, the processors/cores that had been given those sections of the iteration range would quickly complete their work, making them free to steal tasks from other processors/cores. Which would mean TBB's scheduler would be able to scale the processing.

Is this correct? Or am I missing something?

jimdempseyatthecove · ‎07-30-2007

One of the pitfalls in breaking the loop in conversion from single threaded to multi-threaded is to perform the break by setting the for(index variable to the termination condition. This works in the single threaded case because there is one instance of the index variable. In multi-threaded programming there are multiple instances of the indexing variable therefor potentially only one thread will terminate.

The solution to this is to declare a volatile termination variable

void foo()
{
boolAbortNow = FALSE;
int i;
#pragma omp...
for(i=1,...)
{
if(AbortNow) break;
...
if(ConditionForAbort) { AbortNow = TRUE; break;};
// do not use AbortNow = ConditionForAbort
// as other thread may unset the AbortNow flag
...
}
}

Jim Dempsey

Michael_V_Intel · ‎07-30-2007

A few comments on this code:

for (...) {
if (i<=j) {
run_my_iterative_code();
}
// if (i > j) just iterate
}

The difference between this code and the code suggested by Robert and Michael is that you do not use ashared variable to communicate the abort across the threads. Instead, each iteration computes the i<=j. The difference is subtle but perhaps important.

for (...) {
   if (!aborted) {

      if (i>j) aborted == true;
      else run_my_iterative_code();

}
}

When using the shared variable, once any thread sets the abort conditionto true, all other iterations will see the change to the abort variable.

In your code on the otherhand, each iteration calculates the abort condition and so its possible that some iterations may decide the condition is true, while others may decide it is false.

In your example code, you don't provide the definitions of i and j, so this allows us to speculate about how they might be set. If for example, i and j are just randomly assigned values in each iteration, then the behavior will not at all look like that of the original sequential code that used the "break" -- randomly some iterations will run_my_iterative_code() while others will not.

If you use the shared variable approach, the first iteration to decide that i>j will set the shared aborted variable. All other iterations that have not yet started will then skip run_my_iterative_code().

And Robert also noted that with TBB,the !aborted expression can be moved into the outerfor loop condition.

Michael Voss
TBB Developer
Performance Analysis and Threading Lab, Intel

robert-reed · ‎07-30-2007

There's one other thing I'd like to point out in Jim Dempsey's suggestion:

#pragma omp...
for(i=1,...)
{
if(AbortNow) break;

For OpenMP, this is illegal:

The for-loop must be a structured block, and in addition, its execution must not be terminated by a break statement.

That leads me to a question for Michael Voss. What would happen if this break was in the for loop of a parallel_for functor? I'd assume that the task would finish early and go wait at the post-loop join, freeing the thread to go work on some other task. In other words, it seems like it should do the right thing.

pippolomeo · ‎07-31-2007

hey guys I would like to thank you all for the big amount of competent replies. I had the need to stop all branches when the search for a particular value in a function would have finished.

I've applied a method based on what I've read on the blog regarding openMP and it works flawlessly.

Alexey-Kukanov · ‎07-31-2007

mad reed:

#pragma omp...
for(i=1,...)
{
if(AbortNow) break;

...

What would happen if this break was in the for loop of a parallel_for functor? I'd assume that the task would finish early and go wait at the post-loop join, freeing the thread to go work on some other task. In other words, it seems like it should do the right thing.

Yes, this is what will happen, except that there is no such notion as "wait at the post-loop join" for a TBB task- it will just complete, and the task object will be destroyed then. Task synchronization in TBB is somewhat different from OpenMP; it is done using parent-child task relations, reference counting, and empty tasks. But this is a topic for another thread :)

Michael_V_Intel · ‎07-31-2007

In reply to Roberts question about what would happen if a "break" was in the for loop of a parallel_for functor.

I agree with Alexey's answer, that the task would return early, skipping the iterations left in it's subrange after the "break".

And I'd like to reiterate Robert's point that "Under OpenMP, in addition to the loop early termination restrictions,the canonical form of the for/do loop is very restricted in order to guarantee that the compiler can decipher and rewrite the loop to divide it among threads."

TBB on the otherhand uses generic programming and is therefore much less restrictive.The parallel_for body object must define an operator().But this operator is not even required to contain a loop. The TBB runtime will invoke the operator() on sub-ranges, but what the operaror does witha sub-range is completely up to the developer.