Several Intel CPU models cut

Daniel_N_ · ‎03-06-2017

Hi,

some DO-loops in my code are slower when the code is compiled with -xAVX (and -align array32byte). Therefore, I introduced the NOVECTOR directive. The code looks like this:

!DIR$ NOVECTOR
  do k=2,kmx
    mi = mi0 + k
    fx   = fx0/hn(mi)
    fy   = fy0/hn(mi)
    ! ...
    ! some more code
    ! ...
  enddo

The compiler yields the warning:

tflow_ptc.F90(1160): warning #6477: Fortran 2008 does not allow this statement or directive.
!DIR$ NOVECTOR

Interestingly, the assembler code shows v* commands in the NOVECTOR-do-loop and the run time get even worse. The FCFLAG environment variable for the compiler look as follows:

export FCFLAGS="-nofixed -convert big_endian -stand f08 -qopenmp -O2 -fp-model precise -g -debug inline-debug-info -parallel-source-info=2 -align array32byte -xAVX -I/sw/dataformats/hdf5/1.8.17/smp1/intel.16.0.3.210_threadsafe/include -I/sw/dataformats/netcdf/4.3.3.1/smp1/intel.16.0.3.210_threadsafe/include"

I use the Intel VTune Amplifier to evaluate the runtime. Therefore, the debug flags are included.

The ifort compiler version is: 16.0.3.210. There are some more versions available at our HPC-system but I did not try them yet.

My questions are:

Are the VECTOR and NOVECTOR directives valid in Fortran 2008 code?
I would expect that the NOVECTOR directive is ignored when it is not allowed. Is this a correct assumption?
Is the NOVECTOR directive correctly used? In contrast to the !$omp directives there is no end-statement needed?

Thanks you for the help.

Kind Regards,

Daniel

Steve_Lionel · ‎03-06-2017

Directives are extensions - if you ask for standards warnings, which you did, all recognized directives will generate such a warning.

The way to look at this is that directives are "statements with funny syntax" - they affect how the compiler processes the source, so they aren't really comments (except to a compiler that does not support directives.) You are free to use directives just like any other supported extension, but the compiler is correct to alert you to the fact that they are non-standard.

TimP · ‎03-06-2017

It doesn't look like the directive should be discarded. If you turn on -qopt-report you should see whether it has any effect. If the compiler is working correctly, it's simply a warning that the directive will not work with non-Intel Fortran.

When you set AVX target, all floating point instructions should be AVX ones, even the scalar ones. The full AVX vectorization instructions will have the names with "p" rather than "s" and, in the usual case of AVX-256, will use ymm registers.

Intel Parallel Advisor (along with -qopt-report4) could give more quick insight as to whether performance losses are associated with vectorization, such as whether excessive time is spent in remainder loops, in which case it could remind you of typical tactics for dealing with that. Among them, for the case where you execute a loop too frequently with too short a trip count for useful vectorization (such as 5), might be !dir$ loop count avg(5).

-fp-model precise prevents the compiler from using tricks to alleviate the relatively slow performance of divide on Sandy Bridge.

Daniel_N_ · ‎03-06-2017

Thanks you, Steve and Ti, for the fast reply. They solved my questions. Now I understand why the warnings are thrown. I have some follow up questions (and the original title does not fit). I attached them below. For the future: should I modify the title or start a new thread?

The vectorization report states that vectorization is not performed due to the novector directive. Thus, it worked. Using Intel Parallel Advisor looks promising. We have it installed but I never used it before. Thanks for mentioning it :-) .

When you set AVX target, all floating point instructions should be AVX ones, even the scalar ones. The full AVX vectorization instructions will have the names with "p" rather than "s" and, in the usual case of AVX-256, will use ymm registers.

I was once told that the vector registers of the Intel Xeons operate on a lower frequency than the standard registers. Is this still correct? Thus, if I have a relative high number of scalar operations it might be reasonable not to use AVX but to do the vectorization "manually" - e.g. by using omp simd directives? (I know: It depends on the code and the situations and I should use Intel Parallel Advisor to be sure. But just to get a rough feeling whether I understand it correctly ... .).

Among them, for the case where you execute a loop too frequently with too short a trip count for useful vectorization (such as 5), might be !dir$ loop count avg(5).

Actually, there are some loops of the length 20. That directive might help.

TimP · ‎03-06-2017

Several Intel CPU models cut back on turbo mode acceleration when ymm registers are in use. I've not heard of any list of details. I don't think it's considered to be a factor in choosing whether to vectorize. I'd guess this applies only to server CPUs (those which have all of temperature, power consumption, and number of active threads criteria for turbo mode).

!DIR$ NOVECTOR: Fortran 2008 does not allow this statement or directive.