Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

Parallel processing of MIN()

Intel_C_Intel
Employee
459 Views

Hi,

I am starting to learn about OpenMP to enhance performance, and the early result I obtained are encouraging. However, there is one loop that does give me trouble.

for(int row){
#pragma ivdep
for(int col){
out1(row, col) = exp(in1(row, col));
out2(row, col) = exp(in2(row, col));
out3(row, col) = MIN(out1(row, col), out2(row, col));
min = MIN(out3(row, col) , min);
}
}

From reading the documentation, I think the problem is caused by the last line, probably because the variable min is shared. Is there a way to overcome this problem, maybe by including a statement that would say something like "each thread has a local copy of min and work on it, and at the end, the global min is computed as the MIN of both threads' local min." Or is that impossible?

Thanks in advance

Alex

0 Kudos
5 Replies
jimdempseyatthecove
Honored Contributor III
459 Views

Alex,

Assuming that the outer loop is the OpemMP parallel loop then prior to the outer loop declare as volatileand initialize a MinOfMins to HUGE of whatever type is used (float, double, int, ...). Then inside the outer loop but before the inner loop declare and initialize aMinOfRow to HUGE of whatever type is used. Then process the columns of the row while locating the MinOfRow. Following the exit of the inner loop you will have a valid MinOfRow.

Now to properly update MinOfMins in a thread safe manner you can use a critical section or use an _InterlockedCompareExchange. I suggest learning how, and getting familliar with the _Interlocked... series of intrinsic function calls.

do {
float copyMinOfMins = MinOfMins;
if(copyMinOfMins < MinOfRow) break;
} while(_InterlockedCompareExchange(&MinOfMins, MinOfRow, copyMinOfMins ) == copyMinOfMins );

To get that to work you will need some (cast) to (long volatile*) and to (long).

And if you have double then use _InterlockedCompareExchange64 and appropriate (cast)

Do not produce the copyMinOfMins after you test MinOfMins against MinOfRow as it may change between the if test and the following statement that produces the copy. As in doing so your code will be subject to error.

An alternate way that might produce cleaner looking code at the expense of some memory is to declare outside the outer loop an integer array of IndexOfMinOfRows[numberOfRows]; and pre-initialize each with value of 0; The inner loop can then update the IndexOfMinOfRows[row] entry with the col of the min in that row. Then outside the parallel loop have an additional loop that finds the min value using the array of IndexOfMinOfRows. This will reduce the number of MIN tests inside the loop and eliminate the critical section or _Interlocked... call. The end result may be faster code. (assuming you do not need an alloc to get the memory for IndexOfMinOfRows array).

Jim Dempsey

0 Kudos
TimP
Honored Contributor III
459 Views
I answered this before, pointing out a URL which shows which reduction operations are available in Fortran and C. I guess my answer was removed. We have to assume that your MIN resembles a Fortran MIN. As it is not a standard C intrinsic, there is no corresponding OpenMP reduction operation for C.
You should be able to find posted examples demonstrating the latter method which Jim proposed. The overhead for creating and destroying the local array (if needed) should be negligible, as it should be done outside the parallel region.
0 Kudos
TimP
Honored Contributor III
459 Views
I meant to add that when you save the individual minima to a contiguous array, there is a potential false sharing problem. You want to collect the local value for each thread in a private variable, then copy it to the array at the end, so the false sharing occurs at most once per value stored in the temporary array.
0 Kudos
jimdempseyatthecove
Honored Contributor III
459 Views

Tim,

Are you implying that min() can be used as a reduction operator?

In MSDN re reduction

op

Not an overloaded operator but one of +, *, -, &, ^, |, &&, or ||.

binop

Not an overloaded operator but one of +, *, -, &, ^, or |.

Intrinsic functions are not listed so that is why I proposed the interlocked or the array of (or indexes of) mins. This would, in most cases, execute faster than the interlocked methodat the expense of some RAM.

Note too, if you were doing sum of arrays that maintaining an array of sums then summing that would be faster than using reduction. (although it requires more code writing).

Jim

0 Kudos
TimP
Honored Contributor III
459 Views
Jim,
Sorry if my writing is causing confusion. I was trying to point out there is an OpenMP MIN() reduction operator for Fortran, but not for C. Also, I was trying to point out that MIN() is not a language defined operator in C or C++; no doubt this has a direct bearing on the differences in the lists of reduction operators between Fortran and C. In fact, there are variations in common usage as to how MIN() macros are implemented in C, including protecting (or not) against side effects.
As far as I know, the closest standardized equivalent in C++ is STL min_element() (not recognized by OpenMP). It would have to be used separately by each thread, with the results of the threads combined by one of the strategies you mentioned. min_element() was not supported in the Microsoft 2003 compiler. It is fairly well supported in g++, even to the extent of optimization shortcutting a majority of the C++ idiomatic obstacles.
I didn't want to go so far as to suggest multi-language usage so as to take advantage of the Fortran MIN() reduction operator, although I may have mentioned that in my post which was removed. If someone were writing for gcc compatible compilers, I would think it worth considering.
You might be interested in the post in the Fortran forum about how Intel Windows compilers might in the future support the Microsoft OpenMP.

For a week now, access to software forums has been blocked from the AT&T networks, the only public internet access available to me, in both California and Ohio. I'm wondering why this has not been raised as an issue. I don't know if this has a connection with removal of posts which I made shortly before the blockage.
0 Kudos
Reply