Solved: maxloc oddity with Intel 15+

Matt_Thompson · ‎04-19-2016

All,

I'm currently trying to port a code that works on Intel 15 to use Intel 16 (and 17). In doing so, I encountered a difference in behavior of Intel 16 and 17 compared to 15 (as well as to NAG, PGI, and GNU). A variant also showed a possible bug in Intel 15.

The code is:

program test

   implicit none

   integer, parameter :: nvars = 12
   character(len=*), parameter :: vars(nvars)=(/ &
                    ' dw', ' ps', ' pw', ' rw',  &
                    '  q', 'spd', '  t', ' uv',  &
                    'sst', 'gps', 'lag', 'tcp' /)
   integer :: ivars(nvars)
   logical :: lvars(nvars)
   integer :: i
   character(len=3) :: var
   integer :: is(1)

   integer :: iis

   is = 0
   iis = 0

   do i=1,nvars
      ivars(i) = i
   end do

   var = 'X'

   write (*,*) 'var: ', var
   write (*,*) 'vars: ', vars
   write (*,*) 'ivars: ', ivars
   write (*,*) 'vars==var: ', vars==var

   is = maxloc(ivars,vars==var)
   write (*,*) 'is: ', is

   lvars = vars==var
   write (*,*) 'lvars: ', lvars

   iis = maxloc(ivars,dim=1,mask=lvars)
   write (*,*) 'iis: ', iis

end program test

What we are trying to do is find where "var" is in "vars". Now, is maxloc the way to do this? No, probably not, but, still, before I workaround, I want to report this. In this case, since 'X' is not in vars, is -> 0. I've also added a new chunk of code using MAXLOC(ARRAY,DIM=DIM,MASK=MASK) thinking maybe that would help.

Now in Intel 15:

(1513) $ ifort --version
ifort (IFORT) 15.0.2 20150121
Copyright (C) 1985-2015 Intel Corporation.  All rights reserved.

(1514) $ ifort test.F90 && ./a.out
 var: X  
 vars:  dw ps pw rw  qspd  t uvsstgpslagtcp
 ivars:            1           2           3           4           5           6
           7           8           9          10          11          12
 vars==var:  F F F F F F F F F F F F
 is:            0
 lvars:  F F F F F F F F F F F F
 iis:            1

Note, is=0 which is why the larger code this is part of worked. My 'dimmed' attempt to see if using dim=1 and outputting an integer threw a 1, which is unexpected.

Now Intel 16:

(1362) $ ifort --version
ifort (IFORT) 16.0.2 20160204
Copyright (C) 1985-2016 Intel Corporation.  All rights reserved.

(1363) $ ifort test.F90 && ./a.out
 var: X  
 vars:  dw ps pw rw  qspd  t uvsstgpslagtcp
 ivars:            1           2           3           4           5           6
           7           8           9          10          11          12
 vars==var:  F F F F F F F F F F F F
 is:            1
 lvars:  F F F F F F F F F F F F
 iis:            1

Now is=1. My reading of the standard seems that this use of MAXLOC falls under Case (ii) of the MAXLOC spec:

...If ARRAY has size zero or every element of MASK has the value false, all elements of the result are zero.

Every element of MASK is false is both MAXLOC calls so I think the result should be 0.

Thanks,

Matt

Steven_L_Intel1 · ‎04-19-2016

You want "-assume noold_maxminloc" or "-standard-semantics".

View solution in original post

Steven_L_Intel1 · ‎04-19-2016

You want "-assume noold_maxminloc" or "-standard-semantics".

Matt_Thompson · ‎04-19-2016

Steve Lionel (Intel) wrote:

You want "-assume noold_maxminloc" or "-standard-semantics".

Well, I'll be... I didn't even think of looking for something involving maxloc in the man page. I guess it's time to set that in our Intel 16 and Intel 17 options. Thanks!

I'm tempted to try -standard-semantics, but I'm afraid that noold_unit_star would bite us in the rear. Then again, probably a good exercise anyway as well as time to really look at all the assume flags and wonder "What does our code assume?"

TimP · ‎04-19-2016

In my experience, you should avoid noold_maxminloc with 16.0.0 due to bugs with optimization. 16.0.1 and initial 17.0 will not optimize with that option but also will not fail. 16.0.2 and final 15.0 are best.

For linear search of findloc variety, I prefer to write an internal function which implements the needed cases of findloc. I suppose it may be possible to write a fairly complete findloc in f95. The only performance deficit compared with an f77 loop is due to creation of a temporary array which you also require when using maxloc for this purpose.

Matt_Thompson · ‎04-19-2016

Tim P. wrote:

In my experience, you should avoid noold_maxminloc with 16.0.0 due to bugs with optimization. 16.0.1 and initial 17.0 will not optimize with that option but also will not fail. 16.0.2 and final 15.0 are best.

Okay. Luckily my Intel 16 tag is at 16.0.2. Now when you say "won't optimize", do you mean no code will, or just code with maxloc/minloc calls will not optimize?

For linear search of findloc variety, I prefer to write an internal function which implements the needed cases of findloc. I suppose it may be possible to write a fairly complete findloc in f95. The only performance deficit compared with an f77 loop is due to creation of a temporary array which you also require when using maxloc for this purpose.

My attempted workaround was a simple do-loop-with-exit. It worked in limited testing. Heck, I know in the past that a loop-with-exit can outperform even minloc.

Steven_L_Intel1 · ‎04-19-2016

We mention this option in the description of MAXLOC and MINLOC. This is another case where an earlier standard didn't specify the behavior and later standardized on something different from what we had implemented. Unfortunately, as Tim notes, the standard behavior can have poor performance. We looked at changing the default for 17 but found some important benchmarks where it hurt too much. We'll look for ways to improve this in the future.

TimP · ‎04-19-2016

My comments about optimization probably aren't relevant for an array of character valuables. I was referring to optimization of maxloc itself, .not to effects elsewhere. I've been happy with 16.0.2 but Steve's comments may indicate it's not settled.

Steven_L_Intel1 · ‎04-19-2016

No, it's not settled. To be honest, we are mystified that you saw an improvement with 16.0.2 as we don't know of any relevant changes there. It is still a work-in-progress.

dkokron · ‎05-02-2017

This issue has bitten us (I work for the same organization as thematt) again. This time one of our CFD codes (OVERFLOW) silently returned incorrect answers when compiled with 2016.?.? and 2017.2.174. See attached reproducer.

TimP · ‎05-02-2017

Note that the official recommendation (issued yesterday) for Overflow is to add -assume noold_maxminloc to the build options. Overflow requires (depending on the model being attempted) the Fortran standard result from maxloc with MASK set to all .false.

If simd optimization were supported for the standard_semantics version, I would see no reason to be setting old_maxminloc. Reports seem to indicate that the old_maxminloc result hasn't been consistent across compiler upgrades, let alone with the standard.

dkokron · ‎05-02-2017

Tim,

The OVERFLOW author asked me to work with Intel to fix this issue. The author and many users consider -assume noold_maxminloc to be a workaround for a bug in the Intel compiler and are mad a hell they have to use it just to get the correct answer.

Dan

TimP · ‎05-02-2017

The Overflow authors may not be the only ones wishing that ifort didn't require the -standard-semantics option to enable f95, f2003 (and soon f2008) compatibility. The policy seems to be well entrenched, as it goes back prior to my 13 years employment and 3 years since retirement from Intel.

My personal practice is to set -standard-semantics followed by -assume old_maxminloc for the cases where I want MAXLOC optimized and don't care about the zero-length array result.

Steve_Lionel · ‎05-02-2017

Tim, your observation is incorrect. Some of the defaults have changed to be standard-compliant over the years. A recent case is -assume realloc_lhs.