AVX512 optimizations causing incorrect IEEE floating point exceptions

scspiegel · ‎12-14-2017

I'm compiling some code for a cluster containing Xeon Gold 6148 (Skylake) processors, and I've been trying to take advantage of the AVX512 instruction set by including the -xCORE-AVX512 compiler flag. The problem is this optimization is causing the ieee_divide_by_zero and ieee_invalid IEEE exception flags to be signaling after a section of code that should not result in either exception, whereas -xCORE-AVX2 (or lower optimization levels) produces the correct result with no exceptions.

Here is an example of code that creates a divide-by-zero exception with -xCORE-AVX512, but no exceptions for other optimization levels:

! x, y, w are all real(dp), dimension(1:n)
! For n=3, previous code results in...
x = [-0.7745967_dp, 0.0_dp, 0.7745967_dp]
y = [3.0_dp, -1.5_dp, 3.0_dp ]
do i = 1,n
  w(i) = 2.0_dp / ( (1.0_dp - x(i)*x(i)) * y(i)*y(i) )
end do

The following code creates no exceptions with -xCORE-AVX512:

! Using the same data as above...
do i = 1,n
  y2(i) = y(i)*y(i)
end do
do i = 1,n
  y2_inv(i) = 1.0_dp / y2(i)
end do
do i = 1,n
  w(i) = 2.0_dp * y2_inv(i) / (1.0_dp - x(i)*x(i))
end do

Trying to combine any of the above three loops results in exceptions with AVX512. I've attached a driver program that recreates this problem when I run it on one of these Xeon Skylake processors. I'm getting these results with both Intel Fortran 2018 and 2018 SP1, and the compilation commands I'm using are (-fpp is needed because I'm using __LINE__ and __FILE__):

ifort -fpp -O3 -xCORE-AVX512 -o avx512.out main.f90
ifort -fpp -O3 -xCORE-AVX2 -o avx2.out main.f90
ifort -fpp -O3 -xSSE4.2 -o sse42.out main.f90