Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

A question about $omp declare simd

eos_pengwern
Beginner
424 Views

I'm updating some of my long-standing code to take advantage of the latest OpenMP 4.0 syntax, but I've come unstuck at the first attempt. 

I decided to take a very simple, short subroutine that calculates a vector product: one of the vectors, 'normal' is unique, represented by its scalar components 'normal_x', 'normal_y' and 'normal_z', whilst the other vector 'rzero' is typically one of a large number of vectors represented by arrays of 'rzero_x', 'rzero_y' and 'rzero_z' components. Multiplying the normal by 'rzero' gives 'es', and taking the amplitude of 'es' gives us the sine of the angle between 'rzero' and the normal, viz:

    elemental subroutine Calculate_Sines(normal_x, normal_y, normal_z, &
                                         rzero_x, rzero_y, rzero_z,    &
                                         es_x, es_y, es_z, sin_theta)
    
    !$omp declare simd(Calculate_Sines) uniform(normal_x, normal_y, normal_z)
        real(kind(1d0)), intent(in) :: normal_x, normal_y, normal_z
        real(kind(1d0)), intent(in) :: rzero_x, rzero_y, rzero_z 
        real(kind(1d0)), intent(out) :: es_x, es_y, es_z, sin_theta
          
        es_x = normal_z * rzero_y - normal_y * rzero_z
        es_y = normal_x * rzero_z - normal_z * rzero_x
        es_z = normal_y * rzero_x - normal_x * rzero_y        
        
        sin_theta = sqrt( es_x**2 + es_y**2 + es_z**2 ) 
    
    end subroutine Calculate_Sines 

If I unit-test this by putting it into a program all on its own, then its works just as I'd expect: if I compile with /Qopt-report:2, then I am told:

    15301: FUNCTION WAS VECTORIZED

...just as I'd hope. However, if I now take exactly the same code snippet and past it into the module where I actually want to use it (but as yet without having it actually invoked from anywhere else in the application), then compile with exactly the same relevant switches (/O2 /Qparallel /Qopenm /standard-semantics /Qopt-report:2) then I'm told:

    warning #13397: vector function was not vectorized
    warning #13401: vector function was emulated

The compiler persistently refuses to give me any clues as to what may be preventing it from vectorising the function, even if I bump Qopt-report all the way up to 5.

So, what is going on here?

 

 

 

0 Kudos
3 Replies
pbkenned1
Employee
424 Views

What version of the compiler are  you using?  From the Intel Fortran Compiler version 15.0 and onwards, it is sufficient to incorporate a SIMD-enabled procedure within a module. Then any calling routine that USE's the module has access to the interface and can see that a SIMD-enabled version of the function is available.  If you are using 14.0, you'll need to upgrade for this to work.

>>>However, if I now take exactly the same code snippet and past it into the module where I actually want to use it (but as yet without having it actually invoked from anywhere else in the application), then compile with exactly the same relevant switches (/O2 /Qparallel /Qopenm

I'd did just that with 15.0.3.  I'll assume /Qopenm is just a typo.

C:\ISN_Forums\U558125>ifort -c -Qopenmp U558125-module.f90 -Qopt-report=5 -Qopt-report-file=stdout -Qopt-report-phase=openmp,vec
Intel(R) Visual Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.3.208 Build 20150407
Copyright (C) 1985-2015 Intel Corporation.  All rights reserved.


Begin optimization report for: SINES::CALCULATE_SINES.._simdsimd3__xmm2nuuuvvvvvvv.J

    Report from: Vector optimizations [vec]

remark #15301: FUNCTION WAS VECTORIZED   [ C:\ISN_Forums\U558125\U558125-module.
f90(5,22) ]
===========================================================================

Begin optimization report for: SINES::CALCULATE_SINES.._simdsimd3__xmm2muuuvvvvvvv.J

    Report from: Vector optimizations [vec]

remark #15301: FUNCTION WAS VECTORIZED   [ C:\ISN_Forums\U558125\U558125-module.
f90(5,22) ]
===========================================================================

C:\ISN_Forums\U558125>ifort  -Qopenmp U558125-mod-use.f90  U558125-module.obj
Intel(R) Visual Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.3.208 Build 20150407
Copyright (C) 1985-2015 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 12.00.21005.1
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:U558125-mod-use.exe
-subsystem:console
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
U558125-mod-use.obj
U558125-module.obj

C:\ISN_Forums\U558125>U558125-mod-use.exe
 sin_theta=  0.000000000000000E+000

C:\ISN_Forums\U558125>

If the compiler version is not the issue, perhaps it's due to something in your module source inhibiting vectorization.

Patrick

0 Kudos
eos_pengwern
Beginner
424 Views

Thank you. I hadn't realised that there was so much additional information in the .optrept file. The contents of that file sheds a little more light on things. In my module, the subroutine (including comments) occupies lines 1769 - 1786. Here is the relevant part of the .optrept file:

Begin optimization report for: WORKSPACES::CALCULATE_SINES.._simdsimd3__xmm4nuuuvvvvvvv.J

    Report from: Interprocedural optimizations [ipo]

C:\Users\..\FortranSource\Workspaces.F90(1769,26):INLINE REPORT START:(WORKSPACES::CALCULATE_SINES.._simdsimd3__xmm4nuuuvvvvvvv.J) [20/31=64.5%]
C:\Users\..\FortranSource\Workspaces.F90(1769,26):(1)-> EXTERN: ___for_ieee_store_env_
C:\Users\..\FortranSource\Workspaces.F90(1786,5):(1)-> EXTERN: ___for_ieee_restore_env_
INLINE REPORT END

    Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]

C:\Users\..\FortranSource\Workspaces.F90(1769,26):remark #15301: FUNCTION WAS VECTORIZED
<compiler generated>:remark #15399: vectorization support: unroll factor set to 2
===========================================================================

Begin optimization report for: WORKSPACES::CALCULATE_SINES.._simdsimd3__xmm4nuuuvvvvvvv

    Report from: Interprocedural optimizations [ipo]

C:\Users\..\FortranSource\Workspaces.F90(1769,26):INLINE REPORT START:(WORKSPACES::CALCULATE_SINES.._simdsimd3__xmm4nuuuvvvvvvv) [20/31=64.5%]
C:\Users\..\FortranSource\Workspaces.F90(1769,26):(1)-> EXTERN: ___for_ieee_store_env_
C:\Users\..\FortranSource\Workspaces.F90(1786,5):(1)-> EXTERN: ___for_ieee_restore_env_
INLINE REPORT END
===========================================================================

Begin optimization report for: WORKSPACES::CALCULATE_SINES.._simdsimd3__xmm4muuuvvvvvvv.J

    Report from: Interprocedural optimizations [ipo]

C:\Users\..\FortranSource\Workspaces.F90(1769,26):INLINE REPORT START:(WORKSPACES::CALCULATE_SINES.._simdsimd3__xmm4muuuvvvvvvv.J) [20/31=64.5%]
C:\Users\..\FortranSource\Workspaces.F90(1769,26):(1)-> EXTERN: ___for_ieee_store_env_
C:\Users\..\FortranSource\Workspaces.F90(1786,5):(1)-> EXTERN: ___for_ieee_restore_env_
INLINE REPORT END

    Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]

C:\Users\..\FortranSource\Workspaces.F90(1769,26):remark #15326: function was not vectorized: implied FP exception model prevents vectorization. Consider changing compiler flags and/or directives in the source to enable fast FP model and to mask FP exceptions
C:\Users\..\FortranSource\Workspaces.F90(1769,26):remark #13397: vector function was not vectorized
===========================================================================

Begin optimization report for: WORKSPACES::CALCULATE_SINES.._simdsimd3__xmm4muuuvvvvvvv

    Report from: Interprocedural optimizations [ipo]

C:\Users\..\FortranSource\Workspaces.F90(1769,26):INLINE REPORT START:(WORKSPACES::CALCULATE_SINES.._simdsimd3__xmm4muuuvvvvvvv) [20/31=64.5%]
C:\Users\..\FortranSource\Workspaces.F90(1769,26):(1)-> EXTERN: ___for_ieee_store_env_
C:\Users\..\FortranSource\Workspaces.F90(1786,5):(1)-> EXTERN: ___for_ieee_restore_env_
INLINE REPORT END
===========================================================================

Begin optimization report for: WORKSPACES::CALCULATE_SINES

    Report from: Interprocedural optimizations [ipo]

C:\Users\..\FortranSource\Workspaces.F90(1769,26):INLINE REPORT START:(WORKSPACES::CALCULATE_SINES) [20/31=64.5%]
C:\Users\..\FortranSource\Workspaces.F90(1769,26):(1)-> EXTERN: ___for_ieee_store_env_
C:\Users\..\FortranSource\Workspaces.F90(1786,5):(1)-> EXTERN: ___for_ieee_restore_env_
INLINE REPORT END
===========================================================================

I assume that the function is being compiled in different ways to accommodate the various ways it may be called (scalar, do-loop, index-range etc.) and that it is succeeding to vectorise it in some cases but not in others. In the case that fails, it cites the floating-point exception model as a stumbling block. I was compiling with /fp:source and /fp:except, so I tried turning off exceptions (/fp:except-) and sure enough the vectorisation succeeded.

 

0 Kudos
Kevin_D_Intel
Employee
424 Views

Refer to this new post for some important updated discussion on the treatment of the elemental subroutine in the original post for this thread by the 16.0 compiler.

0 Kudos
Reply