Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28445 Discussions

Intel Inspector reports data race in atan2 function

mattytee
Beginner
802 Views

Hello,

Intel inspector in Parallel Studio XE 2019 detects a data race when atan2 function is used. Here is a sample code:

 

program test_arctan

implicit none

real*8 x(1000),y(1000),r(1000)

integer i,n

n=1000
x=0.1d0
y=0.1d0

!$omp parallel do schedule(static,1)
do i=1,n
  r(i)=atan2(x(i),y(i))
enddo

do i=1,n
  write(5000,*)r(i)
enddo

end program test_arctan

 

The attached snapshot of the Inspector screen showing read/write race is from a different, larger, code, which the sample program here is meant to reproduce. I also tested explicitly declaring the arguments as thread private and that also got rid of the data race error when using the atan2 function.:

 

!$omp parallel do schedule(static,1) &
!$omp& firstprivate(x,y)
do i=1,n
  r(i)=atan2(x(i),y(i))
enddo

 

Is it a false positive or do I have a problem using atan2 like that?

 

Thank you

0 Kudos
6 Replies
Steve_Lionel
Honored Contributor III
762 Views

Your sample code looks nothing like what is shown in the screenshot.

0 Kudos
mattytee
Beginner
744 Views

Yes, it is from the original code that I could not share, as I mentioned in the original message. The image was meant to illustrate the actual reported error. A similar one, for different variables, is generated for the sample program I shared.

0 Kudos
Steve_Lionel
Honored Contributor III
732 Views

But it's not at all similar. In the screenshot, the arguments to atan2 are scalars, whereas in your "sample" they are array elements indexed by the parallel loop.

I did, however, find an issue when I built the program as a release build and parallelization enabled.  It appears to be inside the SVML (vector math library) when it is initializing the "feature flag" based on the processor type. (See screenshot attached). This doesn't look right to me and I suggest you report it to Intel for investigation.

0 Kudos
Steve_Lionel
Honored Contributor III
692 Views

I did some more thinking about the data race, and if it is doing what I think, it is harmless. The first time you call an optimized math routine, it checks the CPU type so that it can do "CPU dispatching" for best performance. Then it writes a code into a global memory location that it checks on future calls. In a multithreaded environment, it's always going to write the same code, so it doesn't matter if there are two threads trying to write it. The library could try to synchronize access, but that would be slow and unnecessary.

 

0 Kudos
mattytee
Beginner
724 Views

Thank you, Steve, for looking into it. Sorry, if it was confusing.

0 Kudos
jimdempseyatthecove
Honored Contributor III
699 Views

In addition to the SVML issue Steve mentioned, the above code should not use static scheduling with chunk size of 1. Doing so will result in excessive cache line evictions amongst cores of your thread team. To correct this:

a) align arrays x, y, and r on cache line boundaries (currently 64 bytes) and use a chunk size of multiples of cells in cache line (64/sizeof(x(1)).

b) use static scheduling without specifying chunk size (and consider adding simd clause too)

 

Additionally, x and y can be shared as there currently is an unnecessary copy operation.

 

Jim Dempsey

0 Kudos
Reply