- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
All,
I'm currently trying to port a code that works on Intel 15 to use Intel 16 (and 17). In doing so, I encountered a difference in behavior of Intel 16 and 17 compared to 15 (as well as to NAG, PGI, and GNU). A variant also showed a possible bug in Intel 15.
The code is:
program test implicit none integer, parameter :: nvars = 12 character(len=*), parameter :: vars(nvars)=(/ & ' dw', ' ps', ' pw', ' rw', & ' q', 'spd', ' t', ' uv', & 'sst', 'gps', 'lag', 'tcp' /) integer :: ivars(nvars) logical :: lvars(nvars) integer :: i character(len=3) :: var integer :: is(1) integer :: iis is = 0 iis = 0 do i=1,nvars ivars(i) = i end do var = 'X' write (*,*) 'var: ', var write (*,*) 'vars: ', vars write (*,*) 'ivars: ', ivars write (*,*) 'vars==var: ', vars==var is = maxloc(ivars,vars==var) write (*,*) 'is: ', is lvars = vars==var write (*,*) 'lvars: ', lvars iis = maxloc(ivars,dim=1,mask=lvars) write (*,*) 'iis: ', iis end program test
What we are trying to do is find where "var" is in "vars". Now, is maxloc the way to do this? No, probably not, but, still, before I workaround, I want to report this. In this case, since 'X' is not in vars, is -> 0. I've also added a new chunk of code using MAXLOC(ARRAY,DIM=DIM,MASK=MASK) thinking maybe that would help.
Now in Intel 15:
(1513) $ ifort --version ifort (IFORT) 15.0.2 20150121 Copyright (C) 1985-2015 Intel Corporation. All rights reserved. (1514) $ ifort test.F90 && ./a.out var: X vars: dw ps pw rw qspd t uvsstgpslagtcp ivars: 1 2 3 4 5 6 7 8 9 10 11 12 vars==var: F F F F F F F F F F F F is: 0 lvars: F F F F F F F F F F F F iis: 1
Note, is=0 which is why the larger code this is part of worked. My 'dimmed' attempt to see if using dim=1 and outputting an integer threw a 1, which is unexpected.
Now Intel 16:
(1362) $ ifort --version ifort (IFORT) 16.0.2 20160204 Copyright (C) 1985-2016 Intel Corporation. All rights reserved. (1363) $ ifort test.F90 && ./a.out var: X vars: dw ps pw rw qspd t uvsstgpslagtcp ivars: 1 2 3 4 5 6 7 8 9 10 11 12 vars==var: F F F F F F F F F F F F is: 1 lvars: F F F F F F F F F F F F iis: 1
Now is=1. My reading of the standard seems that this use of MAXLOC falls under Case (ii) of the MAXLOC spec:
...If ARRAY has size zero or every element of MASK has the value false, all elements of the result are zero.
Every element of MASK is false is both MAXLOC calls so I think the result should be 0.
Thanks,
Matt
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You want "-assume noold_maxminloc" or "-standard-semantics".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
You want "-assume noold_maxminloc" or "-standard-semantics".
Well, I'll be... I didn't even think of looking for something involving maxloc in the man page. I guess it's time to set that in our Intel 16 and Intel 17 options. Thanks!
I'm tempted to try -standard-semantics, but I'm afraid that noold_unit_star would bite us in the rear. Then again, probably a good exercise anyway as well as time to really look at all the assume flags and wonder "What does our code assume?"
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my experience, you should avoid noold_maxminloc with 16.0.0 due to bugs with optimization. 16.0.1 and initial 17.0 will not optimize with that option but also will not fail. 16.0.2 and final 15.0 are best.
For linear search of findloc variety, I prefer to write an internal function which implements the needed cases of findloc. I suppose it may be possible to write a fairly complete findloc in f95. The only performance deficit compared with an f77 loop is due to creation of a temporary array which you also require when using maxloc for this purpose.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim P. wrote:
In my experience, you should avoid noold_maxminloc with 16.0.0 due to bugs with optimization. 16.0.1 and initial 17.0 will not optimize with that option but also will not fail. 16.0.2 and final 15.0 are best.
Okay. Luckily my Intel 16 tag is at 16.0.2. Now when you say "won't optimize", do you mean no code will, or just code with maxloc/minloc calls will not optimize?
For linear search of findloc variety, I prefer to write an internal function which implements the needed cases of findloc. I suppose it may be possible to write a fairly complete findloc in f95. The only performance deficit compared with an f77 loop is due to creation of a temporary array which you also require when using maxloc for this purpose.
My attempted workaround was a simple do-loop-with-exit. It worked in limited testing. Heck, I know in the past that a loop-with-exit can outperform even minloc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We mention this option in the description of MAXLOC and MINLOC. This is another case where an earlier standard didn't specify the behavior and later standardized on something different from what we had implemented. Unfortunately, as Tim notes, the standard behavior can have poor performance. We looked at changing the default for 17 but found some important benchmarks where it hurt too much. We'll look for ways to improve this in the future.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
No, it's not settled. To be honest, we are mystified that you saw an improvement with 16.0.2 as we don't know of any relevant changes there. It is still a work-in-progress.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Note that the official recommendation (issued yesterday) for Overflow is to add -assume noold_maxminloc to the build options. Overflow requires (depending on the model being attempted) the Fortran standard result from maxloc with MASK set to all .false.
If simd optimization were supported for the standard_semantics version, I would see no reason to be setting old_maxminloc. Reports seem to indicate that the old_maxminloc result hasn't been consistent across compiler upgrades, let alone with the standard.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim,
The OVERFLOW author asked me to work with Intel to fix this issue. The author and many users consider -assume noold_maxminloc to be a workaround for a bug in the Intel compiler and are mad a hell they have to use it just to get the correct answer.
Dan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The Overflow authors may not be the only ones wishing that ifort didn't require the -standard-semantics option to enable f95, f2003 (and soon f2008) compatibility. The policy seems to be well entrenched, as it goes back prior to my 13 years employment and 3 years since retirement from Intel.
My personal practice is to set -standard-semantics followed by -assume old_maxminloc for the cases where I want MAXLOC optimized and don't care about the zero-length array result.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim, your observation is incorrect. Some of the defaults have changed to be standard-compliant over the years. A recent case is -assume realloc_lhs.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page