- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
Intel inspector in Parallel Studio XE 2019 detects a data race when atan2 function is used. Here is a sample code:
program test_arctan
implicit none
real*8 x(1000),y(1000),r(1000)
integer i,n
n=1000
x=0.1d0
y=0.1d0
!$omp parallel do schedule(static,1)
do i=1,n
r(i)=atan2(x(i),y(i))
enddo
do i=1,n
write(5000,*)r(i)
enddo
end program test_arctan
The attached snapshot of the Inspector screen showing read/write race is from a different, larger, code, which the sample program here is meant to reproduce. I also tested explicitly declaring the arguments as thread private and that also got rid of the data race error when using the atan2 function.:
!$omp parallel do schedule(static,1) &
!$omp& firstprivate(x,y)
do i=1,n
r(i)=atan2(x(i),y(i))
enddo
Is it a false positive or do I have a problem using atan2 like that?
Thank you
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your sample code looks nothing like what is shown in the screenshot.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, it is from the original code that I could not share, as I mentioned in the original message. The image was meant to illustrate the actual reported error. A similar one, for different variables, is generated for the sample program I shared.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But it's not at all similar. In the screenshot, the arguments to atan2 are scalars, whereas in your "sample" they are array elements indexed by the parallel loop.
I did, however, find an issue when I built the program as a release build and parallelization enabled. It appears to be inside the SVML (vector math library) when it is initializing the "feature flag" based on the processor type. (See screenshot attached). This doesn't look right to me and I suggest you report it to Intel for investigation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I did some more thinking about the data race, and if it is doing what I think, it is harmless. The first time you call an optimized math routine, it checks the CPU type so that it can do "CPU dispatching" for best performance. Then it writes a code into a global memory location that it checks on future calls. In a multithreaded environment, it's always going to write the same code, so it doesn't matter if there are two threads trying to write it. The library could try to synchronize access, but that would be slow and unnecessary.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, Steve, for looking into it. Sorry, if it was confusing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In addition to the SVML issue Steve mentioned, the above code should not use static scheduling with chunk size of 1. Doing so will result in excessive cache line evictions amongst cores of your thread team. To correct this:
a) align arrays x, y, and r on cache line boundaries (currently 64 bytes) and use a chunk size of multiples of cells in cache line (64/sizeof(x(1)).
b) use static scheduling without specifying chunk size (and consider adding simd clause too)
Additionally, x and y can be shared as there currently is an unnecessary copy operation.
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page