Solved: Structuring bug in parallel for?

Nav · ‎01-11-2010

What would happen in a parallel_for loop, if you place a goto, which in mid-iteration, goes-to somewhere outside the loop?
OpenMP will show you an error that you can't enter a parallel_for from outside with a goto, and can't exit with a goto either. It shows the error at compilation.
But it appears TBB malfunctions. Or was this planned to be executed as such on purpose?
My code:

#include
#include
#include "tbb/tbb.h"
#define ITERATIONS 500
using namespace std;
using namespace tbb;

void Foo(float& a) {cout<<<" ";}//Foo

class ApplyFoo
{
int *const my_a;
public:
void operator()(const blocked_range& r) const
{
int *a = my_a;
for(size_t i=r.begin();i!=r.end();++i)
{
if (i==10) {cout<<"[[[[]]]]"< if (i<10) printf(" >>i=%d ", a); else printf(" i=%d ", a);
}
end:printf(" Reached end label ");;
}
ApplyFoo(int a[]) : my_a(a) {}

};//ApplyFoo

void ParallelApplyFoo(...

Alexey-Kukanov · ‎01-25-2010

Quoting Nav

"Alexey Kukanov did."

I'm sorry, I should've cross-checked before posting. I conclude he meant there is nothing that can be done about it.

If there are three ways to make a parallel_for loop malfunction and nothing can be done about it from the library point of view, then I hope at least the debugging software will be able to point out the errors.

Yes there is nothing that can be done, because as I said a loop that uses any of these tricks essentially becomes non-parallelizable.

What do you mean under "debugging software"? The debugging version of the library can't do it - we know nothing of how the user code is organized. A special source code analysis tool could do that, but so far there is no such a tool.

View solution in original post

RafSchietekat · ‎01-11-2010

Perhaps you might be interested in task_group::cancel() instead? Not for this example, though.

In your program, all ApplyFoo::operator() invocations will inevitably print "Reached end label" because that's just the last statement in the body, and there's no return statement prior to it, but that's just a red herring. The real problem is that sometimes the range will start after 10, so testing for i==10 won't do what you want, unlike in the serial case where the loop always goes from 0 upwards to the end; try i<10 instead.

With that in mind, do any of your goto-related questions still apply?

(Added) Well, where the program now has i==10 it would have to be the negation of i<10, but the real problem behind this contrived example is probably more interesting than fixing such details?

Nav · ‎01-12-2010

Quoting - Raf Schietekat

Perhaps you might be interested in task_group::cancel() instead? Not for this example, though.

In your program, all ApplyFoo::operator() invocations will inevitably print "Reached end label" because that's just the last statement in the body, and there's no return statement prior to it, but that's just a red herring. The real problem is that sometimes the range will start after 10, so testing for i==10 won't do what you want, unlike in the serial case where the loop always goes from 0 upwards to the end; try i<10 instead.

With that in mind, do any of your goto-related questions still apply?

(Added) Well, where the program now has i==10 it would have to be the negation of i<10, but the real problem behind this contrived example is probably more interesting than fixing such details?

When posting this I had marked it as a bug report. I only wished to bring forward this problem to Intel and ask them if this was intended to work this way and whether in future, the parallel_for could take into account such anomalies.

This, because I felt that the syntax of a library should be such that it prevents the programmer from making mistakes. In C and OpenMP there are plenty of places where the syntax allows the programmer to make mistakes....which results in longer debugging and development time.
Stronger syntax was introduced in C++, and I was under the impression that TBB, like C++, would have it's syntax in such a way that the programmer would not be able to make mistakes. Now I feel that I might have been wrong to assume so.

Thanks for the explanation of the "Reached end label". I had wanted to know why it happened that way. Apart from that, I just wanted to bring forward the issue related to goto's placed in a parallel loop; because to me, the above program is malfunctioning.

Alexey-Kukanov · ‎01-12-2010

Quoting - Nav

What would happen in a parallel_for loop, if you place a goto, which in mid-iteration, goes-to somewhere outside the loop?
OpenMP will show you an error that you can't enter a parallel_for from outside with a goto, and can't exit with a goto either. It shows the error at compilation.
But it appears TBB malfunctions. Or was this planned to be executed as such on purpose?
...
I know goto's have been denounced repeatedly, but I have also seen very respectable algorithm's still in use today, which break out of a loop with a goto. If I have to apply a parallel for to one of those loops, I'll have to restructure the algorithm (which shouldn't be the case, because there are arguments in support of goto's also).
But basically, if there are constructs like parallel_reduce, then why can't there be a condition in a parallel_for which caters to goto's also?
Could that be implemented in TBB?

Once you have a goto or break in a serial loop, strictly speaking its iterations are not independent anymore (because some iterations may or may not be executed depending on the result of previous iterations), and therefore this loop can not be parallelized. That's why OpenMP does not allow goto into or out of the parallel loop.

So it has nothing to do with goto being "denounced". You just can not use goto or break in a parallel loop exactly the same way you use it in a serial loop.

As I already said in another thread, TBB is a pure C++ library and it works with any standard C++ compiler. Having goto in the user-supplied operator() call is just fine from C++ standpoint (as long as the jump is within the function). The compiler (e.g. g++) does not know that you want to use that function object with a TBB parallel loop; in fact, it knows nothing about TBB). And TBB is completely agnostic of the user code; it just executes whatever method is provided. Neither of the two has the high-level information required to recognize the problem - unlike OpenMP, where the compiler knows of the specification requirements and could enforce those.

But there is a solution in TBB that allows "breaking" a parallel loop. Yes you will have to restructure the code to soem extent, but not much. For an example, see the TBB Tutorial, Section 5.1 "Cancellation Without An Exception". And of course we are ready to help you if you have further problems with it.

Nav · ‎01-13-2010

Quoting - Alexey Kukanov (Intel)

So it has nothing to do with goto being "denounced". You just can not use goto or break in a parallel loop exactly the same way you use it in a serial loop.

I'm aware of how the iterations are broken up and how goto won't work the same way when used in a parallel_for loop etc. The only reason for posting this, was to bring forward a supposed bug, which could be corrected in the coming versions of TBB. That's all.
When I mentioned goto being denounced, I was speaking with reference with this: http://en.wikipedia.org/wiki/Goto#Criticism_of_goto_usage
There are counter-arguments too: http://www.stevemcconnell.com/ccgoto.htm

p.s:I tried the task::self().cancel_group_execution(), but iterations after 10 were executed anyway.

RafSchietekat · ‎01-13-2010

"The only reason for posting this, was to bring forward a supposed bug, which could be corrected in the coming versions of TBB. That's all."
OK, but now you see there is no bug, right?

A well-considered small set of goto patterns is probably acceptable and more useful than either extreme (allowing complete freedom or banning it outright).

"p.s:I tried the task::self().cancel_group_execution(), but iterations after 10 were executed anyway."
I was speculating that you were perhaps looking for a way to cancel execution, because this example looks rather contrived. If you really just wanted to stop at 10, then providing a range ending at std::min(10,n) would obviously be the easiest solution, and otherwise you can test for i<10 instead of i==10 (because at most one invocation has a range parameter that contains 10). If you are actually searching for a single result and want to cancel further execution when you've found what you wanted (this is something else than the example, which wants to visit all values below 10 without skipping any), use cancellation support to send that message to the right scope, and poll is_cancelled() to receive it (I think parallel_for is doing its part as well), but also be prepared to cleanly wind down an invocation that has already started etc. In this case, consider that, e.g., a high-end subrange may start above 10 and may be executing before a low-end subrange even sees the value 10, let alone gets its cancellation message out. But, again, it's not clear to me what you are really trying to achieve.

Nav · ‎01-17-2010

Sorry I couldn't reply sooner. My forum login didn't give me the "reply" link on Friday. It's rectified now.

"OK, but now you see there is no bug, right?"- I still see it as a bug. Forget about the goto. Many programmers use "break;" to exit a for loop. The iterations after the break get executed. That shouldn't happen.

Couldn't the same thing happen when you try to break out of a parallel_do loop? (haven't tried it, but will do so soon)

Rather than call it "contrived", I'd call it experimentation. Please don't take my code at face value. I haven't started building my application as yet. I need to figure out how TBB works and resolve some questions first.

RafSchietekat · ‎01-17-2010

"I still see it as a bug. Forget about the goto. Many programmers use "break;" to exit a for loop. The iterations after the break get executed. That shouldn't happen."
If you have a condition that's detected only when you actually visit the element, and you don't want any elements beyond it to be visited, you not only have to give up on TBB, you have to give up on parallel execution altogether, at least for that part of the computation.

"Couldn't the same thing happen when you try to break out of a parallel_do loop? (haven't tried it, but will do so soon)"
parallel_do will have visited fewer subsequent elements by the time it has detected a termination condition, but you should still be prepared for that to happen.

"Rather than call it "contrived", I'd call it experimentation. Please don't take my code at face value. I haven't started building my application as yet. I need to figure out how TBB works and resolve some questions first."
You have to be careful not to build the wrong assumptions into an experiment and draw the wrong conclusions from that.

Maybe you should consider a pipeline with a serial input filter instead, because if you return NULL when the termination condition applies no further elements will pass through the pipeline.

Nav · ‎01-18-2010

:-) okay......

RafSchietekat · ‎01-18-2010

":-) okay......"

?

Nav · ‎01-18-2010

Which means I agree with you :) The extra characters in the post are because the forum has a min 14 character per post rule :)

Nav · ‎01-24-2010

There was a technique I used to use a long time back for exiting for loops. Tried it in the TBB and OpenMP parallel_for's and the program malfunctioned in both.

pseudocode:

for(i=0;i<10;++i)

{

if (i==2) i=10;/*condition to stop looping*/

}

I'm a bit disappointed that none of the TBB developers have replied to this thread. Isn't there anything that can be done to take the load off the developer?

RafSchietekat · ‎01-24-2010

"I'm a bit disappointed that none of the TBB developers have replied to this thread."
Alexey Kukanov did.

"Isn't there anything that can be done to take the load off the developer?"
I don't see what could.

Nav · ‎01-25-2010

"Alexey Kukanov did."

I'm sorry, I should've cross-checked before posting. I conclude he meant there is nothing that can be done about it.

If there are three ways to make a parallel_for loop malfunction and nothing can be done about it from the library point of view, then I hope at least the debugging software will be able to point out the errors.

Alexey-Kukanov · ‎01-25-2010

Quoting Nav

"Alexey Kukanov did."

I'm sorry, I should've cross-checked before posting. I conclude he meant there is nothing that can be done about it.

If there are three ways to make a parallel_for loop malfunction and nothing can be done about it from the library point of view, then I hope at least the debugging software will be able to point out the errors.

Yes there is nothing that can be done, because as I said a loop that uses any of these tricks essentially becomes non-parallelizable.

What do you mean under "debugging software"? The debugging version of the library can't do it - we know nothing of how the user code is organized. A special source code analysis tool could do that, but so far there is no such a tool.

Nav · ‎01-25-2010

I meant something like the Gnu Debugger. But was actually hinting at any modern debugger which would be intelligent enough to find errors in parallel code. I guess that's a faraway dream...