- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We've been adding lots of OpenMP SIMD instructions to our electronic structure code (http://elk.sourceforge.net/) and successfully sped it up.
But we've also encountered a few potential compiler bugs along the the way. The first was reported here: https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/805677
I think there may be another. Here is the simplest code which still has the problem:
program test use modbug implicit none integer i complex(8) z1 complex(8), allocatable :: x(:),y(:) complex(8) zf external zf n=10 allocate(r(n)) allocate(x(n),y(n)) r(:)=1 x(:)=1 y(:)=1 z1=zf(x,y) print *,z1 end program complex(8) function zf(x,y) use modbug implicit none complex(8), intent(in) :: x(n) complex(8), intent(in) :: y(n) ! local variables integer i zf=0.d0 !$OMP SIMD do i=1,n zf=zf+r(i)*conjg(x(i))*y(i) end do return end function
A module in a separate file is also needed:
module modbug integer n real(8), allocatable :: r(:) end module
The code is compiled with
ifort -O3 -ip -axCORE-AVX2,AVX,SSE4.2 -qopenmp modbug.f90 test.f90
on our Intel Xeon E5-2680 cluster with Intel Fortran 18.0.0.
The correct output should be 10.0, but with the SIMD directive the code returns 5.0 instead.
If the module file is included in the same file as the code then the compiler reports:
test.f90(35): warning #15552: loop was not vectorized with "simd"
and the code works fine.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would say that this loop is not parallelizable, as the complex variable zf to which things are added does have different values for parallel executions of the loop and in fact needs a serial execution of the loop. Note that all the examples one finds for the OMP SIMD pragma are loops of the form
!$OMP SIMD do i = 1, n a(i) = a(i) * b(i) + c(i) end do
where you have elemental operations, but not an incrementing operation on a variable quasi-global to the loop.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try:
zf=0.d0 !$OMP SIMD REDUCTION(+:zf) do i=1,n zf=zf+r(i)*conjg(x(i))*y(i) end do
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
jimdempseyatthecove wrote:Try:
zf=0.d0 !$OMP SIMD REDUCTION(+:zf) do i=1,n zf=zf+r(i)*conjg(x(i))*y(i) end doJim Dempsey
This is what we did originally. Unfortunately it yields:
catastrophic error: **Internal compiler error: segmentation violation signal raised** Please report this error along with the circumstances in which it occurred in a Software Problem Report. Note: File and line given may not be explicit cause of this error.
in Intel Fortran version 17 but only for the more complicated version of the code in Elk. The simplified example does not result in the error but does not yield vectorized code.
After trying it without the REDUCTION clause we discovered the error in Intel Fortran version 18 as stated in the original post.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Juergen R. wrote:I would say that this loop is not parallelizable, as the complex variable zf to which things are added does have different values for parallel executions of the loop and in fact needs a serial execution of the loop. Note that all the examples one finds for the OMP SIMD pragma are loops of the form
!$OMP SIMD do i = 1, n a(i) = a(i) * b(i) + c(i) end dowhere you have elemental operations, but not an incrementing operation on a variable quasi-global to the loop.
It is permitted to update the same variable within a SIMD loop. As Jim mentioned, it's better to tell the compiler that it is a REDUCTION variable. Unfortunately, this resulted in a compile-time error for Intel Fortran version 17 for the more complicated version of the simple example above.
If all the variables are real in the above example (with or without REDUCTION), Intel Fortran 17/18 compiles without the warning that no vectorization is performed. However, this does not result in a measurable speed-up but we've added it to Elk nevertheless.
The original bug still stands: an OMP SIMD directive alone should not break code. At worst it will not result in any vectorization performed.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page