- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'd like to use the following code:
[fortran]!$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j, exception_found) !$OMP DO do j=1,m ! There is a risk that rounding errors may lead to an occasional negative ! value for the expression inside the square root; we look for the IEEE ! invalid flag to intercept and correct this case on the relatively rare ! occasions when it occurs: call ieee_set_flag(ieee_invalid, .false.) do i=1,n cos_phi(i,j,k) = sqrt(1d0 - sin_phi(i,j,k)**2) end do call ieee_get_flag(ieee_invalid, exception_found) if (exception_found) then do i=1,n if (sin_phi(i,j,k).gt.1d0) cos_phi(i,j,k) = 0d0 end do end if end do !$OMP END DO !$OMP END PARALLEL[/fortran]
However, I'm unsure of exactly how the IEEE exception flags are implemented: is there one set of flags per thread, or just one for the whole processor? If the latter, then an exception occurring in one thread will trigger unnecessary actions in all the other threads, which could quickly become very inefficient.
Does anyone know?
1 Solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>I meant one matrix element in 10,000, rather than one run through the code
That wasn't clear in the comments (at least when I first read it).
>>so in a typical case where n x m >> 10,000
Then the probability of error is ~1.0
When an error occurs, one of the threads will run through the corrector loop, which is only correcting cos_phi, it should also correct sin_phi. The thread(s) that run the repair loop will also delay the completion of the parallel region
Therefore the likely candidate for best performance is to assure sin_phi(i,j,k) is .le. 1.0d0 and .ge. -1.0d0.
I cannot guess as to if using a seperate loop conditioning sin_phi would be faster than incorporating MAX in the cos_phi calculation. However, when you generate sin_phi(i,j,k) if you do this conditioning, you won't need it in the cos_phi calculation loop (but you may also need to condition cos_phi to .le. 1.0d0 and .ge. -1.0d0.
GI-GO (Garbage In, Garbage Out)
When you create these tables for use by your application, it is your responsibility to assure that values are correct enough to no propagate unsuspecting errors later in your code.
Jim Dempsey
That wasn't clear in the comments (at least when I first read it).
>>so in a typical case where n x m >> 10,000
Then the probability of error is ~1.0
When an error occurs, one of the threads will run through the corrector loop, which is only correcting cos_phi, it should also correct sin_phi. The thread(s) that run the repair loop will also delay the completion of the parallel region
Therefore the likely candidate for best performance is to assure sin_phi(i,j,k) is .le. 1.0d0 and .ge. -1.0d0.
I cannot guess as to if using a seperate loop conditioning sin_phi would be faster than incorporating MAX in the cos_phi calculation. However, when you generate sin_phi(i,j,k) if you do this conditioning, you won't need it in the cos_phi calculation loop (but you may also need to condition cos_phi to .le. 1.0d0 and .ge. -1.0d0.
GI-GO (Garbage In, Garbage Out)
When you create these tables for use by your application, it is your responsibility to assure that values are correct enough to no propagate unsuspecting errors later in your code.
Jim Dempsey
Link Copied
9 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Wouldn't it be better for your program to assure that sin_phi(i,j,k) is .le. 1.0d0?
IOW catch the error when producing sin_phi(i,j,k) as opposed to each place it is used?
Jim Dempsey
IOW catch the error when producing sin_phi(i,j,k) as opposed to each place it is used?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The thing is, in my application, rounding errors where sin_phi(i,j,k).gt.1d0 are very rare - maybe one in 10,000 or less - and are of no consequence in any context other than the calculation of cos_phi. Checking every sin_phi value at source would add a fair bit of overhead, whilst by looking for IEEE exceptions I can (in theory)reduce the overhead by only handling the cases where there is actually a problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your exception handler (code you provide and hooked using IEEE_HANDLER) can save the exception codes into Thread Local Storage (as opposed to, or in addition to, in global memory). If you do this then you can have a thread-by-thread detection of floating point errors.
Jim Dempsey
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Consider:
[fortran]do ! There is a risk that rounding errors may lead to an occasional negative ! value for the expression inside the square root; we look for the IEEE ! invalid flag to intercept and correct this case on the relatively rare ! (less than 1 i 10,000) occasions when it occurs: call ieee_set_flag(ieee_invalid, .false.) !$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j) !$OMP DO do j=1,m do i=1,n cos_phi(i,j,k) = sqrt(1d0 - sin_phi(i,j,k)**2) end do end do !$OMP END DO !$OMP END PARALLEL call ieee_get_flag(ieee_invalid, exception_found) if(.not. exception_found) exit !$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j) !$OMP DO do j=1,m do i=1,n if (sin_phi(i,j,k).gt.1d0) sin_phi(i,j,k) = 1.0d0 end do end do !$OMP END DO !$OMP END PARALLEL end do[/fortran] Jim Dempsey
[fortran]do ! There is a risk that rounding errors may lead to an occasional negative ! value for the expression inside the square root; we look for the IEEE ! invalid flag to intercept and correct this case on the relatively rare ! (less than 1 i 10,000) occasions when it occurs: call ieee_set_flag(ieee_invalid, .false.) !$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j) !$OMP DO do j=1,m do i=1,n cos_phi(i,j,k) = sqrt(1d0 - sin_phi(i,j,k)**2) end do end do !$OMP END DO !$OMP END PARALLEL call ieee_get_flag(ieee_invalid, exception_found) if(.not. exception_found) exit !$OMP PARALLEL DEFAULT(SHARED) PRIVATE(i,j) !$OMP DO do j=1,m do i=1,n if (sin_phi(i,j,k).gt.1d0) sin_phi(i,j,k) = 1.0d0 end do end do !$OMP END DO !$OMP END PARALLEL end do[/fortran] Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Jim.
With respect to your mail at 25 minutes past the hour, isn't this what is accomplished by declaring 'exception_found' as a private variable in my original example?
The code in your mail at 37 minutes past the hour wouldn't work so well; when I said '1 time in 10,000' I meant one matrix element in 10,000, rather than one run through the code, so in a typical case where n x m >> 10,000 I'd just be running the second part of the code every single time. What I'd like to do is only to run through the i-loop once when a problem is found.
With respect to your mail at 25 minutes past the hour, isn't this what is accomplished by declaring 'exception_found' as a private variable in my original example?
The code in your mail at 37 minutes past the hour wouldn't work so well; when I said '1 time in 10,000' I meant one matrix element in 10,000, rather than one run through the code, so in a typical case where n x m >> 10,000 I'd just be running the second part of the code every single time. What I'd like to do is only to run through the i-loop once when a problem is found.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>I meant one matrix element in 10,000, rather than one run through the code
That wasn't clear in the comments (at least when I first read it).
>>so in a typical case where n x m >> 10,000
Then the probability of error is ~1.0
When an error occurs, one of the threads will run through the corrector loop, which is only correcting cos_phi, it should also correct sin_phi. The thread(s) that run the repair loop will also delay the completion of the parallel region
Therefore the likely candidate for best performance is to assure sin_phi(i,j,k) is .le. 1.0d0 and .ge. -1.0d0.
I cannot guess as to if using a seperate loop conditioning sin_phi would be faster than incorporating MAX in the cos_phi calculation. However, when you generate sin_phi(i,j,k) if you do this conditioning, you won't need it in the cos_phi calculation loop (but you may also need to condition cos_phi to .le. 1.0d0 and .ge. -1.0d0.
GI-GO (Garbage In, Garbage Out)
When you create these tables for use by your application, it is your responsibility to assure that values are correct enough to no propagate unsuspecting errors later in your code.
Jim Dempsey
That wasn't clear in the comments (at least when I first read it).
>>so in a typical case where n x m >> 10,000
Then the probability of error is ~1.0
When an error occurs, one of the threads will run through the corrector loop, which is only correcting cos_phi, it should also correct sin_phi. The thread(s) that run the repair loop will also delay the completion of the parallel region
Therefore the likely candidate for best performance is to assure sin_phi(i,j,k) is .le. 1.0d0 and .ge. -1.0d0.
I cannot guess as to if using a seperate loop conditioning sin_phi would be faster than incorporating MAX in the cos_phi calculation. However, when you generate sin_phi(i,j,k) if you do this conditioning, you won't need it in the cos_phi calculation loop (but you may also need to condition cos_phi to .le. 1.0d0 and .ge. -1.0d0.
GI-GO (Garbage In, Garbage Out)
When you create these tables for use by your application, it is your responsibility to assure that values are correct enough to no propagate unsuspecting errors later in your code.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks again Jim.
I'm probably just trying to be too clever here, by trying to use IEEE exception handling when it's not necessarily the appropriate tool for the job.
For future reference, I'd still be curious to know how the exception handling is implemented when used in conjunction with OpenMP. I haven't been able to find any documentation on this, anywhere.
I'm probably just trying to be too clever here, by trying to use IEEE exception handling when it's not necessarily the appropriate tool for the job.
For future reference, I'd still be curious to know how the exception handling is implemented when used in conjunction with OpenMP. I haven't been able to find any documentation on this, anywhere.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I covered this earlier.
You use the IEEE_HANDLER function to establish (set) an (your) external signal handling routine. I believe the handler address is global (but can easily be tested to verify this assumption). Therefor, any thread causing an exception will cause this subroutine to be called in the context of the thread causing the exception. You can put any code you want into this subroutine including writing the error conditions into Thread Local Storage. (You also need to write the updates to the global variables in a thread-safe manner. **) Your error detection routine, as used by each thread, can observe the error conditions as set in Thread Local Storage (as opposed to global error flags).
Your handler can store the error flags in both thread local and application global locations. In loops where you want all threads to bail-out you monitor the global flags, in loops such as your cos_phi you might want to monitor only thread local flags - your choice.
*** Note, the IEEE_HANDLER example shows CALL ABORT. You need not ABORT. When the error is not of abort type (e.g. not IEE_DIVIED_BY_ZERO). When not of abort type. you can set your indicator flag(s), using an atomic IOR for global or normal IOR for Thread Local Storage, then RETURN.
Jim Dempsey
You use the IEEE_HANDLER function to establish (set) an (your) external signal handling routine. I believe the handler address is global (but can easily be tested to verify this assumption). Therefor, any thread causing an exception will cause this subroutine to be called in the context of the thread causing the exception. You can put any code you want into this subroutine including writing the error conditions into Thread Local Storage. (You also need to write the updates to the global variables in a thread-safe manner. **) Your error detection routine, as used by each thread, can observe the error conditions as set in Thread Local Storage (as opposed to global error flags).
Your handler can store the error flags in both thread local and application global locations. In loops where you want all threads to bail-out you monitor the global flags, in loops such as your cos_phi you might want to monitor only thread local flags - your choice.
*** Note, the IEEE_HANDLER example shows CALL ABORT. You need not ABORT. When the error is not of abort type (e.g. not IEE_DIVIED_BY_ZERO). When not of abort type. you can set your indicator flag(s), using an atomic IOR for global or normal IOR for Thread Local Storage, then RETURN.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Aha, I see, I'd missed the significance of defining the IEEE_handler routine. Thank you.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page