Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
For the latest information on Intel’s response to the Log4j/Log4Shell vulnerability, please see Intel-SA-00646
26862 Discussions

AVX512 optimizations causing incorrect IEEE floating point exceptions


I'm compiling some code for a cluster containing Xeon Gold 6148 (Skylake) processors, and I've been trying to take advantage of the AVX512 instruction set by including the -xCORE-AVX512 compiler flag.  The problem is this optimization is causing the ieee_divide_by_zero and ieee_invalid IEEE exception flags to be signaling after a section of code that should not result in either exception, whereas -xCORE-AVX2 (or lower optimization levels) produces the correct result with no exceptions.

Here is an example of code that creates a divide-by-zero exception with -xCORE-AVX512, but no exceptions for other optimization levels:

! x, y, w are all real(dp), dimension(1:n)
! For n=3, previous code results in...
x = [-0.7745967_dp, 0.0_dp, 0.7745967_dp]
y = [3.0_dp, -1.5_dp, 3.0_dp ]
do i = 1,n
  w(i) = 2.0_dp / ( (1.0_dp - x(i)*x(i)) * y(i)*y(i) )
end do

The following code creates no exceptions with -xCORE-AVX512:

! Using the same data as above...
do i = 1,n
  y2(i) = y(i)*y(i)
end do
do i = 1,n
  y2_inv(i) = 1.0_dp / y2(i)
end do
do i = 1,n
  w(i) = 2.0_dp * y2_inv(i) / (1.0_dp - x(i)*x(i))
end do

Trying to combine any of the above three loops results in exceptions with AVX512.  I've attached a driver program that recreates this problem when I run it on one of these Xeon Skylake processors.  I'm getting these results with both Intel Fortran 2018 and 2018 SP1, and the compilation commands I'm using are (-fpp is needed because I'm using __LINE__ and __FILE__):

ifort -fpp -O3 -xCORE-AVX512 -o avx512.out main.f90
ifort -fpp -O3 -xCORE-AVX2 -o avx2.out main.f90
ifort -fpp -O3 -xSSE4.2 -o sse42.out main.f90

0 Kudos
0 Replies