Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.
27764 Discussions

Intel Inspector reports data race in atan2 function

mattytee
Beginner
421 Views

Hello,

Intel inspector in Parallel Studio XE 2019 detects a data race when atan2 function is used. Here is a sample code:

 

program test_arctan

implicit none

real*8 x(1000),y(1000),r(1000)

integer i,n

n=1000
x=0.1d0
y=0.1d0

!$omp parallel do schedule(static,1)
do i=1,n
  r(i)=atan2(x(i),y(i))
enddo

do i=1,n
  write(5000,*)r(i)
enddo

end program test_arctan

 

The attached snapshot of the Inspector screen showing read/write race is from a different, larger, code, which the sample program here is meant to reproduce. I also tested explicitly declaring the arguments as thread private and that also got rid of the data race error when using the atan2 function.:

 

!$omp parallel do schedule(static,1) &
!$omp& firstprivate(x,y)
do i=1,n
  r(i)=atan2(x(i),y(i))
enddo

 

Is it a false positive or do I have a problem using atan2 like that?

 

Thank you

0 Kudos
6 Replies
Steve_Lionel
Black Belt Retired Employee
381 Views

Your sample code looks nothing like what is shown in the screenshot.

mattytee
Beginner
363 Views

Yes, it is from the original code that I could not share, as I mentioned in the original message. The image was meant to illustrate the actual reported error. A similar one, for different variables, is generated for the sample program I shared.

Steve_Lionel
Black Belt Retired Employee
351 Views

But it's not at all similar. In the screenshot, the arguments to atan2 are scalars, whereas in your "sample" they are array elements indexed by the parallel loop.

I did, however, find an issue when I built the program as a release build and parallelization enabled.  It appears to be inside the SVML (vector math library) when it is initializing the "feature flag" based on the processor type. (See screenshot attached). This doesn't look right to me and I suggest you report it to Intel for investigation.

Steve_Lionel
Black Belt Retired Employee
311 Views

I did some more thinking about the data race, and if it is doing what I think, it is harmless. The first time you call an optimized math routine, it checks the CPU type so that it can do "CPU dispatching" for best performance. Then it writes a code into a global memory location that it checks on future calls. In a multithreaded environment, it's always going to write the same code, so it doesn't matter if there are two threads trying to write it. The library could try to synchronize access, but that would be slow and unnecessary.

 

mattytee
Beginner
343 Views

Thank you, Steve, for looking into it. Sorry, if it was confusing.

jimdempseyatthecove
Black Belt
318 Views

In addition to the SVML issue Steve mentioned, the above code should not use static scheduling with chunk size of 1. Doing so will result in excessive cache line evictions amongst cores of your thread team. To correct this:

a) align arrays x, y, and r on cache line boundaries (currently 64 bytes) and use a chunk size of multiples of cells in cache line (64/sizeof(x(1)).

b) use static scheduling without specifying chunk size (and consider adding simd clause too)

 

Additionally, x and y can be shared as there currently is an unnecessary copy operation.

 

Jim Dempsey

Reply