- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm compiling some code for a cluster containing Xeon Gold 6148 (Skylake) processors, and I've been trying to take advantage of the AVX512 instruction set by including the -xCORE-AVX512 compiler flag. The problem is this optimization is causing the ieee_divide_by_zero and ieee_invalid IEEE exception flags to be signaling after a section of code that should not result in either exception, whereas -xCORE-AVX2 (or lower optimization levels) produces the correct result with no exceptions.
Here is an example of code that creates a divide-by-zero exception with -xCORE-AVX512, but no exceptions for other optimization levels:
! x, y, w are all real(dp), dimension(1:n) ! For n=3, previous code results in... x = [-0.7745967_dp, 0.0_dp, 0.7745967_dp] y = [3.0_dp, -1.5_dp, 3.0_dp ] do i = 1,n w(i) = 2.0_dp / ( (1.0_dp - x(i)*x(i)) * y(i)*y(i) ) end do
The following code creates no exceptions with -xCORE-AVX512:
! Using the same data as above... do i = 1,n y2(i) = y(i)*y(i) end do do i = 1,n y2_inv(i) = 1.0_dp / y2(i) end do do i = 1,n w(i) = 2.0_dp * y2_inv(i) / (1.0_dp - x(i)*x(i)) end do
Trying to combine any of the above three loops results in exceptions with AVX512. I've attached a driver program that recreates this problem when I run it on one of these Xeon Skylake processors. I'm getting these results with both Intel Fortran 2018 and 2018 SP1, and the compilation commands I'm using are (-fpp is needed because I'm using __LINE__ and __FILE__):
ifort -fpp -O3 -xCORE-AVX512 -o avx512.out main.f90
ifort -fpp -O3 -xCORE-AVX2 -o avx2.out main.f90
ifort -fpp -O3 -xSSE4.2 -o sse42.out main.f90
Link Copied
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page