Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

!$OMP SIMD or !DIR$ SIMD ?

Wentao_Z_
Beginner
2,312 Views

Hi,

I have two quick questions regarding using SIMD directives.

(1) I am currently using !DIR$ SIMD to help the compiler to vectorize the loops. But I just noticed that in OpenMP 4.0 we have the following directives:

!$OMP SIMD

!$OMP END SIMD

I am a little bit confused about why we need !$OMP SIMD since !DIR$ SIMD works for both non-openmp code and openmp code. !$OMP SIMD only works when I add the compiler option -openmp to generate openmp code, right?

(2) What about the portability of the SIMD directives (!DIR$ SIMD or !$OMP SIMD)? Later I may test my code on different platforms with different compilers. I guess !DIR$ SIMD only works with Intel compiler while !$OMP SIMD may be supported by more vendors.

Thanks for your time and help!

Best regards,
   Wentao

0 Kudos
1 Solution
TimP
Honored Contributor III
2,312 Views

From ifort 14.0 on, either -openmp or -openmp-simd enable !$omp simd directives.  I don't think this is clear in the documentation.

In 13.1, -openmp-simd was needed regardless of whether -openmp was set.

ifort 15.0 should have a full implementation of omp simd, except for user defined reduction.  Looking to the future, when ifort has this standards-based set of directives, and other compilers (gfortran 4.10, PGI?) may implement them, this is a better route toward portability than the older legacy ifort directives.

gcc 4.9 implements a similar part of #pragma omp simd but gfortran 4.9 will not yet have !$omp simd.  The gcc policy for omp simd directives is different from Intel's; the directive doesn't over-ride the choice made in the compile options (it only asserts vector independence), and -ffast-math -ftree-vectorize is a more satisfactory way to enable most simd optimizations than is the pragma.  gcc 4.9 ignores the simd clause in the context "pragma parallel for simd", while Intel 14.0 compilers use it effectively (if you have enough cores).  ifort 14 performs excellent combined vectorization with parallel do, not necessarily benefitting from adding the simd clause.

ifort also seems to be headed in the direction, which we advocated long ago before cilk and OpenMP 4 were discussed, of not relying on vectorization directives.  With the default settings "-fp-model fast=1", and the addition of "-assume protect_parens,minus0" the compiler looks like continuing to make good decisions about vectorization, while !$omp simd may perform extreme vectorization which degrades performance.  So we are still down to testing to see whether the directive is good.

I don't know what to call those !dir$ omp directives; it was agreed verbally a year ago that the cilk(tm) Plus designation isn't good for Fortran.

Now I will go further into the realm of personal opinion and discuss some contrasts among the directive families.

Intel directives don't offer a clear path for vectorization of indexed min|max search.  ifort minloc/maxloc are excellent under the (default) old_maxminloc setting (this legacy option is required for auto-vectorization).  I use maxloc for an inner loop inside a parallel do reduction(max: max_) lastprivate(index) for threading.  As there isn't support for omp simd firstprivate, it's important to assure that index is set at least once inside the loop.  I wouldn't attempt to replace maxloc with an omp simd, but I expect that Intel 15.0 compilers will do OK with omp simd reduction lastprivate.  It appears also that 15.0 may do away with the requirement for old_maxminloc.

The suggestions I offer for vectorization of indexed max|min reduction don't necessarily take care of tie-breaking, where Fortran standard requires maxloc or minloc to return the first one in case of ties.  I write the tie-breaking explicitly in my parallel do reduction lastprivate loop.

Cilk(tm) Plus offers indexed max reducers, but they don't offer performance over plain source code without directives, so I wouldn't be surprised if the payoff for user defined reduction is considered unattractive.

I don't know whether min|max reductions will be added to the legacy !dir$ simd family; if not, it would reinforce the conclusion that OpenMP 4 is the way forward.

Intel compilers offer the legacy !dir$ simd as a way to over-ride automatic fast_memcpy substitution.  I haven't heard the reason, but !$omp simd doesn't have this effect.  While fast_memcpy is effective in the majority of situations, it may not be good where the loops are so short that the overhead of the function call and its decision tree is too large, or where you want to use the !dir$ vector nontemporal or -opt-streaming-stores auto feature.

Until it was corrected in the 14.0.2 release, Intel compilers confused !$omp simd safelen(32) with the related !dir$ simd vectorlength.  With the latter, you had to consider what is the correct vector length for your target instruction set, and set it differently among SSE, AVX, and MIC.  With the current compiler, safelen is interpreted correctly as allowing the compiler to ignore dependencies up to the specified distance.  The version !dir$ vectorlengthfor was deprecated almost as soon as it was released.

View solution in original post

0 Kudos
5 Replies
TimP
Honored Contributor III
2,313 Views

From ifort 14.0 on, either -openmp or -openmp-simd enable !$omp simd directives.  I don't think this is clear in the documentation.

In 13.1, -openmp-simd was needed regardless of whether -openmp was set.

ifort 15.0 should have a full implementation of omp simd, except for user defined reduction.  Looking to the future, when ifort has this standards-based set of directives, and other compilers (gfortran 4.10, PGI?) may implement them, this is a better route toward portability than the older legacy ifort directives.

gcc 4.9 implements a similar part of #pragma omp simd but gfortran 4.9 will not yet have !$omp simd.  The gcc policy for omp simd directives is different from Intel's; the directive doesn't over-ride the choice made in the compile options (it only asserts vector independence), and -ffast-math -ftree-vectorize is a more satisfactory way to enable most simd optimizations than is the pragma.  gcc 4.9 ignores the simd clause in the context "pragma parallel for simd", while Intel 14.0 compilers use it effectively (if you have enough cores).  ifort 14 performs excellent combined vectorization with parallel do, not necessarily benefitting from adding the simd clause.

ifort also seems to be headed in the direction, which we advocated long ago before cilk and OpenMP 4 were discussed, of not relying on vectorization directives.  With the default settings "-fp-model fast=1", and the addition of "-assume protect_parens,minus0" the compiler looks like continuing to make good decisions about vectorization, while !$omp simd may perform extreme vectorization which degrades performance.  So we are still down to testing to see whether the directive is good.

I don't know what to call those !dir$ omp directives; it was agreed verbally a year ago that the cilk(tm) Plus designation isn't good for Fortran.

Now I will go further into the realm of personal opinion and discuss some contrasts among the directive families.

Intel directives don't offer a clear path for vectorization of indexed min|max search.  ifort minloc/maxloc are excellent under the (default) old_maxminloc setting (this legacy option is required for auto-vectorization).  I use maxloc for an inner loop inside a parallel do reduction(max: max_) lastprivate(index) for threading.  As there isn't support for omp simd firstprivate, it's important to assure that index is set at least once inside the loop.  I wouldn't attempt to replace maxloc with an omp simd, but I expect that Intel 15.0 compilers will do OK with omp simd reduction lastprivate.  It appears also that 15.0 may do away with the requirement for old_maxminloc.

The suggestions I offer for vectorization of indexed max|min reduction don't necessarily take care of tie-breaking, where Fortran standard requires maxloc or minloc to return the first one in case of ties.  I write the tie-breaking explicitly in my parallel do reduction lastprivate loop.

Cilk(tm) Plus offers indexed max reducers, but they don't offer performance over plain source code without directives, so I wouldn't be surprised if the payoff for user defined reduction is considered unattractive.

I don't know whether min|max reductions will be added to the legacy !dir$ simd family; if not, it would reinforce the conclusion that OpenMP 4 is the way forward.

Intel compilers offer the legacy !dir$ simd as a way to over-ride automatic fast_memcpy substitution.  I haven't heard the reason, but !$omp simd doesn't have this effect.  While fast_memcpy is effective in the majority of situations, it may not be good where the loops are so short that the overhead of the function call and its decision tree is too large, or where you want to use the !dir$ vector nontemporal or -opt-streaming-stores auto feature.

Until it was corrected in the 14.0.2 release, Intel compilers confused !$omp simd safelen(32) with the related !dir$ simd vectorlength.  With the latter, you had to consider what is the correct vector length for your target instruction set, and set it differently among SSE, AVX, and MIC.  With the current compiler, safelen is interpreted correctly as allowing the compiler to ignore dependencies up to the specified distance.  The version !dir$ vectorlengthfor was deprecated almost as soon as it was released.

0 Kudos
Martyn_C_Intel
Employee
2,312 Views

!DIR$ SIMD is an Intel-specific extension; historically, that came first.

!$OMP SIMD  is part of the OpenMP 4.0 standard; this is the form that other vendors may support. There are some changes in the clauses supported, based in part on learnings from !DIR$ SIMD, and on consistency with the rest of OpenMP. For example, ALIGNED is a useful new clause.

0 Kudos
TimP
Honored Contributor III
2,312 Views

To Martyn's point, I assume that the experience demonstrated with ifort !dir$ simd was instrumental in adoption of the OpenMP 4 standard, which went a step further in offering a specification for combining parallelization with simd vectorization.

I may have painted myself into a corner, but omp simd aligned() appears to support only the case where the initial element of an array is aligned, and I don't find it a satisfactory replacement for the Intel alignment assertions.  I ran into a case already where another compiler produced run-time failure when I set omp simd aligned but the alignment requires specification of a different alignment from the one which is supported.   I use Intel alignment assertions successfully in combination with omp simd, but they are sometimes annoyingly verbose.

Intel alignment assertions are particularly important where the compiler has an opportunity for fusion but, in that case, omp simd acts as a nofusion directive, useful only until fusion bugs are corrected (as the Intel compiler team has consistently done).

0 Kudos
Wentao_Z_
Beginner
2,312 Views

Tim Prince wrote:

From ifort 14.0 on, either -openmp or -openmp-simd enable !$omp simd directives.  I don't think this is clear in the documentation.

In 13.1, -openmp-simd was needed regardless of whether -openmp was set.

ifort 15.0 should have a full implementation of omp simd, except for user defined reduction.  Looking to the future, when ifort has this standards-based set of directives, and other compilers (gfortran 4.10, PGI?) may implement them, this is a better route toward portability than the older legacy ifort directives.

gcc 4.9 implements a similar part of #pragma omp simd but gfortran 4.9 will not yet have !$omp simd.  The gcc policy for omp simd directives is different from Intel's; the directive doesn't over-ride the choice made in the compile options (it only asserts vector independence), and -ffast-math -ftree-vectorize is a more satisfactory way to enable most simd optimizations than is the pragma.  gcc 4.9 ignores the simd clause in the context "pragma parallel for simd", while Intel 14.0 compilers use it effectively (if you have enough cores).  ifort 14 performs excellent combined vectorization with parallel do, not necessarily benefitting from adding the simd clause.

ifort also seems to be headed in the direction, which we advocated long ago before cilk and OpenMP 4 were discussed, of not relying on vectorization directives.  With the default settings "-fp-model fast=1", and the addition of "-assume protect_parens,minus0" the compiler looks like continuing to make good decisions about vectorization, while !$omp simd may perform extreme vectorization which degrades performance.  So we are still down to testing to see whether the directive is good.

I don't know what to call those !dir$ omp directives; it was agreed verbally a year ago that the cilk(tm) Plus designation isn't good for Fortran.

Now I will go further into the realm of personal opinion and discuss some contrasts among the directive families.

Intel directives don't offer a clear path for vectorization of indexed min|max search.  ifort minloc/maxloc are excellent under the (default) old_maxminloc setting (this legacy option is required for auto-vectorization).  I use maxloc for an inner loop inside a parallel do reduction(max: max_) lastprivate(index) for threading.  As there isn't support for omp simd firstprivate, it's important to assure that index is set at least once inside the loop.  I wouldn't attempt to replace maxloc with an omp simd, but I expect that Intel 15.0 compilers will do OK with omp simd reduction lastprivate.  It appears also that 15.0 may do away with the requirement for old_maxminloc.

The suggestions I offer for vectorization of indexed max|min reduction don't necessarily take care of tie-breaking, where Fortran standard requires maxloc or minloc to return the first one in case of ties.  I write the tie-breaking explicitly in my parallel do reduction lastprivate loop.

Cilk(tm) Plus offers indexed max reducers, but they don't offer performance over plain source code without directives, so I wouldn't be surprised if the payoff for user defined reduction is considered unattractive.

I don't know whether min|max reductions will be added to the legacy !dir$ simd family; if not, it would reinforce the conclusion that OpenMP 4 is the way forward.

Intel compilers offer the legacy !dir$ simd as a way to over-ride automatic fast_memcpy substitution.  I haven't heard the reason, but !$omp simd doesn't have this effect.  While fast_memcpy is effective in the majority of situations, it may not be good where the loops are so short that the overhead of the function call and its decision tree is too large, or where you want to use the !dir$ vector nontemporal or -opt-streaming-stores auto feature.

Until it was corrected in the 14.0.2 release, Intel compilers confused !$omp simd safelen(32) with the related !dir$ simd vectorlength.  With the latter, you had to consider what is the correct vector length for your target instruction set, and set it differently among SSE, AVX, and MIC.  With the current compiler, safelen is interpreted correctly as allowing the compiler to ignore dependencies up to the specified distance.  The version !dir$ vectorlengthfor was deprecated almost as soon as it was released.

Hi Tim,

Thank you so much for your detailed reply. Now the concepts are much more clear to me. I think !$OMP SIMD should be a better choice. But I got the following compiler warning after I added -openmp-simd (My ifort is 13.1.0):

ifort: command line warning #10122: overriding '-openmp-simd' with '-oobj/ModIO.o'
/opt/apps/intel13/mvapich2/1.9/bin/mpif90 -O0   -module obj -c src/ModDerivBuildOps.f90 -o obj/ModDerivBuildOps.o
/opt/apps/intel13/mvapich2/1.9/bin/mpif90 -c -O2 -openmp -openmp-simd   -module obj src/ModPenta.f90 -o obj/ModPenta.o	
ifort: command line warning #10122: overriding '-openmp-simd' with '-oobj/ModPenta.o'
/opt/apps/intel13/mvapich2/1.9/bin/mpif90 -c -O2 -openmp -openmp-simd   -module obj src/ModDeriv.f90 -o obj/ModDeriv.o	
ifort: command line warning #10122: overriding '-openmp-simd' with '-oobj/ModDeriv.o'
/opt/apps/intel13/mvapich2/1.9/bin/mpif90 -c -O2 -openmp -openmp-simd   -module obj src/ModMetrics.f90 -o obj/ModMetrics.o	
ifort: command line warning #10122: overriding '-openmp-simd' with '-oobj/ModMetrics.o'
/opt/apps/intel13/mvapich2/1.9/bin/mpif90 -c -O2 -openmp -openmp-simd   -module obj src/ModAdvection.f90 -o obj/ModAdvection.o	
ifort: command line warning #10122: overriding '-openmp-simd' with '-oobj/ModAdvection.o'
/opt/apps/intel13/mvapich2/1.9/bin/mpif90 -c -O2 -openmp -openmp-simd   -module obj src/ModRungeKutta.f90 -o obj/ModRungeKutta.o	
ifort: command line warning #10122: overriding '-openmp-simd' with '-oobj/ModRungeKutta.o'
/opt/apps/intel13/mvapich2/1.9/bin/mpif90 -c -O2 -openmp -openmp-simd   -module obj src/ModTimemarch.f90 -o obj/ModTimemarch.o	
ifort: command line warning #10122: overriding '-openmp-simd' with '-oobj/ModTimemarch.o'
/opt/apps/intel13/mvapich2/1.9/bin/mpif90 -c -O2 -openmp -openmp-simd   -module obj src/ModInput.f90 -o obj/ModInput.o	
ifort: command line warning #10122: overriding '-openmp-simd' with '-oobj/ModInput.o'
/opt/apps/intel13/mvapich2/1.9/bin/mpif90 -c -O2 -openmp -openmp-simd   -module obj src/ModRegion.f90 -o obj/ModRegion.o	
ifort: command line warning #10122: overriding '-openmp-simd' with '-oobj/ModRegion.o'
/opt/apps/intel13/mvapich2/1.9/bin/mpif90 -c -O2 -openmp -openmp-simd   -module obj src/ModMain.f90 -o obj/ModMain.o	
ifort: command line warning #10122: overriding '-openmp-simd' with '-oobj/ModMain.o'
/opt/apps/intel13/mvapich2/1.9/bin/mpicc -c -O2 -openmp -openmp-simd   -fPIC  src/plot3d_format.c -o obj/plot3d_format.o	
icc: command line warning #10122: overriding '-openmp-simd' with '-oobj/plot3d_format.o'
/opt/apps/intel13/mvapich2/1.9/bin/mpif90 obj/ModGlobal.o obj/ModDataStruct.o obj/ModMPI.o obj/ModString.o obj/ModParam.o obj/ModPLOT3D_IO.o obj/ModMatrixVectorOps.o obj/ModNR.o obj/ModInitialCondition.o obj/ModDataUtils.o obj/ModIO.o obj/ModDerivBuildOps.o obj/ModPenta.o obj/ModDeriv.o obj/ModMetrics.o obj/ModAdvection.o obj/ModTimemarch.o obj/ModRungeKutta.o obj/ModInput.o obj/ModRegion.o obj/ModMain.o obj/plot3d_format.o -openmp -openmp-simd  -openmp -openmp-simd -o bin/plascomcm
ifort: command line warning #10122: overriding '-openmp-simd' with '-openmp-simd'
ifort: command line warning #10122: overriding '-openmp-simd' with '-obin/plascomcm'

I checked the vectorization report and it says "OpenMP SIMD LOOP WAS VECTORIZED". But what does the warning mean?

Thanks!

Best regards,
   Wentao

0 Kudos
TimP
Honored Contributor III
2,312 Views

Apparently, your compiler is too old to recognize even the -openmp-simd option.  OpenMP 4 functionality was sneaking in to later 13.1 versions, some of which have already been withdrawn from the download site.  14.0.2 and later compilers have much improved OpenMP 4 support.

0 Kudos
Reply