- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am starting to learn about OpenMP to enhance performance, and the early result I obtained are encouraging. However, there is one loop that does give me trouble.
for(int row){
#pragma ivdep
for(int col){
out1(row, col) = exp(in1(row, col));
out2(row, col) = exp(in2(row, col));
out3(row, col) = MIN(out1(row, col), out2(row, col));
min = MIN(out3(row, col) , min);
}
}
From reading the documentation, I think the problem is caused by the last line, probably because the variable min is shared. Is there a way to overcome this problem, maybe by including a statement that would say something like "each thread has a local copy of min and work on it, and at the end, the global min is computed as the MIN of both threads' local min." Or is that impossible?
Thanks in advance
Alex
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Alex,
Assuming that the outer loop is the OpemMP parallel loop then prior to the outer loop declare as volatileand initialize a MinOfMins to HUGE of whatever type is used (float, double, int, ...). Then inside the outer loop but before the inner loop declare and initialize aMinOfRow to HUGE of whatever type is used. Then process the columns of the row while locating the MinOfRow. Following the exit of the inner loop you will have a valid MinOfRow.
Now to properly update MinOfMins in a thread safe manner you can use a critical section or use an _InterlockedCompareExchange. I suggest learning how, and getting familliar with the _Interlocked... series of intrinsic function calls.
do {
float copyMinOfMins = MinOfMins;
if(copyMinOfMins < MinOfRow) break;
} while(_InterlockedCompareExchange(&MinOfMins, MinOfRow, copyMinOfMins ) == copyMinOfMins );
To get that to work you will need some (cast) to (long volatile*) and to (long).
And if you have double then use _InterlockedCompareExchange64 and appropriate (cast)
Do not produce the copyMinOfMins after you test MinOfMins against MinOfRow as it may change between the if test and the following statement that produces the copy. As in doing so your code will be subject to error.
An alternate way that might produce cleaner looking code at the expense of some memory is to declare outside the outer loop an integer array of IndexOfMinOfRows[numberOfRows]; and pre-initialize each with value of 0; The inner loop can then update the IndexOfMinOfRows[row] entry with the col of the min in that row. Then outside the parallel loop have an additional loop that finds the min value using the array of IndexOfMinOfRows. This will reduce the number of MIN tests inside the loop and eliminate the critical section or _Interlocked... call. The end result may be faster code. (assuming you do not need an alloc to get the memory for IndexOfMinOfRows array).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You should be able to find posted examples demonstrating the latter method which Jim proposed. The overhead for creating and destroying the local array (if needed) should be negligible, as it should be done outside the parallel region.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim,
Are you implying that min() can be used as a reduction operator?
In MSDN re reduction
opNot an overloaded operator but one of +, *, -, &, ^, |, &&, or ||.
binopNot an overloaded operator but one of +, *, -, &, ^, or |.
Intrinsic functions are not listed so that is why I proposed the interlocked or the array of (or indexes of) mins. This would, in most cases, execute faster than the interlocked methodat the expense of some RAM.
Note too, if you were doing sum of arrays that maintaining an array of sums then summing that would be faster than using reduction. (although it requires more code writing).
Jim
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry if my writing is causing confusion. I was trying to point out there is an OpenMP MIN() reduction operator for Fortran, but not for C. Also, I was trying to point out that MIN() is not a language defined operator in C or C++; no doubt this has a direct bearing on the differences in the lists of reduction operators between Fortran and C. In fact, there are variations in common usage as to how MIN() macros are implemented in C, including protecting (or not) against side effects.
As far as I know, the closest standardized equivalent in C++ is STL min_element() (not recognized by OpenMP). It would have to be used separately by each thread, with the results of the threads combined by one of the strategies you mentioned. min_element() was not supported in the Microsoft 2003 compiler. It is fairly well supported in g++, even to the extent of optimization shortcutting a majority of the C++ idiomatic obstacles.
I didn't want to go so far as to suggest multi-language usage so as to take advantage of the Fortran MIN() reduction operator, although I may have mentioned that in my post which was removed. If someone were writing for gcc compatible compilers, I would think it worth considering.
You might be interested in the post in the Fortran forum about how Intel Windows compilers might in the future support the Microsoft OpenMP.
For a week now, access to software forums has been blocked from the AT&T networks, the only public internet access available to me, in both California and Ohio. I'm wondering why this has not been raised as an issue. I don't know if this has a connection with removal of posts which I made shortly before the blockage.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page