Solved: findloc with DIM and without

wtstephens · ‎03-06-2023

Playing around with findloc to help modernize some old code, and it seems that I cannot seem to understand the difference between using it with DIM and without. What am I missing?

program find
    integer :: data(5) = (/-3, 3, 2, 0, -2/)
    logical :: bools(5)
    integer :: i
    integer :: a(1)

    write(*,'(A,5I3)') 'Data: ',data
    write(*,*)

    write(*,*) '----- findloc with DIM -----'
    write(*,*) 'First positive index=',findloc(data>0, .TRUE., DIM=1)
    write(*,*)

    bools = data>0
    write(*,*) 'Array of (data>0):',bools
    a = findloc(bools,.TRUE.)
    write(*,*) 'First positive index=',a
    write(*,*)

    write(*,*) '----- findloc without DIM -----'
    write(*,*) 'First positive index=',findloc(data>0, .TRUE.),' why?'
    a = findloc(data>0, .TRUE.)
    write(*,*) 'Array of index:',a
    write(*,*)

endprogram

Has the output:

Data:  -3  3  2  0 -2

 ----- findloc with DIM -----
 First positive index=           2

 Array of (data>0): F T T F F
 First positive index=           2

 ----- findloc without DIM -----
 First positive index=           0  why?
 Array of index:           0

I cannot seem to get findloc without DIM to return a non-zero result! Maybe I should not have been working on the weekend???

Ron_Green · ‎04-01-2024

Yes, the fix is in the 2024 Update 1 packages: 2024.1.0 ifx and ifort 2021.12.0

View solution in original post

Steve_Lionel · ‎03-06-2023

It's a compiler bug. NAG Fortran gets it right.

wtstephens · ‎03-06-2023

Thanks! I added a couple of lines, and became suspicious about the compiler myself!

What I am looking for is a "one-line seek until" so that when I am debugging I can "step over" these types of loops -- there seem to be a lot of them in the old code that I am modernizing.

I am suspicious about the efficiency of that "data>0" array operation here.

How (in)efficient is that?

Ron_Green · ‎03-06-2023

well for size 5 arrays it's efficient enough. If we look at the function call

findloc(data>0, .TRUE., DIM=1)

start with 'data>0'. The compiler generates code to create a logical array of the same shape as 'data'. This is a memory allocation for a temporary. In this case, rank 1 size 5. Pretty fast. By default on stack.

This is a very simple case. Small size, DIM=1.

IF 'data' is large, you are creating a temporary array. It's possible to exhaust stack. See -heap-arrays compiler option to allocate in heap instead of stack.

Also, 'data' is rank 1. What if 'data' is rank 2,3,4 etc? Sure, you say 'DIM=1' so you might assume, well, LOGICALLY findloc() should only create a rank 1 temp since we only care along dimension 1. Well compilers may not be that smart. It could create a temp of the same SHAPE as 'data' first, and then search along DIM=1. It really depends on who writes the code and how much time they have to look for optimizations like looking ahead to DIM, figuring out to flatten a temp to the shape of 'data' in that dimension, etc. Lots of IF conditions, looking outside local context, etc. And if you have hundreds of intrinsics to write and maintain, simplicity and correctness of code (not to mention maintainability) always wins. Context. Humans can spot such smart time savings, compilers may or may not.

IN GENERAL, good coding practice is to write your code so that it is understandable and easy to maintain over time. Your code could outlive us both. I like the style of the code you have in this example - better than a hand-coded loop. You are expressing your intent quite clearly here. Let the compiler do it's best with good, readable code. Unless you need every ounce of performance and you do not care about readability or long term maintenance costs - there are those amongst us in that camp.

wtstephens · ‎03-07-2023

...creating a temporary array...

...better than a hand-coded loop...

Thanks! In this case there are lots of 1 dimension arrays with a length of 15,000 elements. It feels like the hand-coded loop will be faster, unless I can store and re-use the logical array more globally -- which is a bigger refactoring than the current effort.

There are lots of "seek until" loops that I would like to collapse into one line each.

So the one-liners will look like this -- also in order to be one-step in the debugger.

    do i=1,UBOUND(data,1); if( data(i)>0 ) exit; enddo
    write(*,*) 'First positive index=',i

Could that line be "optimum" even with debug turned fully on?

wtstephens · ‎03-07-2023

Wow! Stepping over that 1-line DO in my real code in the debugger took almost 2 minutes!!! That is unexpectedly wildly inefficient! Yikes!

The index stops at 8094 out of 15000 elements.

But if I "go" to a breakpoint just beyond the 1-line DO then it is a super quick 3 milliseconds.

While not exactly broken, that much inefficiency for the "Step Over" the debugger is a show stopper.

Ron_Green · ‎03-07-2023

I cannot comment on the efficiency of the Microsoft debugger and it's Step Over function.

You should consider using a high-precision timer and simply time the 2 methods with your typical array size(s). Below is a simple precise timer module. You can reuse module 'timer' in your own code

module timer
  use ISO_FORTRAN_ENV
  implicit none
  integer, parameter :: sp = REAL32
  integer, parameter :: dp = REAL64
contains
  ! --------------------------------------------------
  ! mytime: returns the current wall clock time
  ! --------------------------------------------------
  function mytime()  result (tseconds)
    real (dp)       :: tseconds
    integer (INT64) ::  count, count_rate, count_max
    real (dp)       :: tsec, rate

    CALL SYSTEM_CLOCK(count, count_rate, count_max)

    tsec = count
    rate = count_rate
    tseconds = tsec / rate
  end function mytime
end module timer

and use it thusly

program foo
use timer
implicit none
real (dp) :: tstart, tstop, ttime
  !... your code
  
  tstart = mytime()
  !...what you time
  tstop = mytime()
  
  !elasped time
  ttime = tstop - tstart
  write(*,*) 'total time: ', ttime
end program foo

You should run this with your Release Configuration.

andrew_4619 · ‎03-07-2023

yes it is this way for years , it has to do checks on each element as it is processed. When I by accident step onto such a line rather than go for a coffee I 1] hit the "breakall" button, this will break at some system runtime line for which there is no source 2] set a break point in your actual code after the line of doom.... 3] hit f5 to continue and it will then get straight to the break point you just set without the long wait.

wtstephens · ‎03-07-2023

You aren't kidding! I put in the intrinsic findloc expecting it to be faster in the debugger than my loop, and it was still nearly 2 minutes on my 15000 element array!

What is it checking for each element??? And who is "it"? The debugger? What can it check during the call?

a = findloc(bools,.TRUE.)

andrew_4619 · ‎03-07-2023

Underflow, Overflow, invalid addressing ,all manner of potential exceptions to trap ....... That is what 'it' (the debugger) does.....

Ron_Green · ‎03-06-2023

gfortran also gets it as well. Yes, Bug. findloc for arg list w/o DIM arg.

I'll start a bug report.

Ron_Green · ‎03-06-2023

Bug ID is CMPLRLIBS-34373

wtstephens · ‎04-20-2023

> Bug ID is CMPLRLIBS-34373

Is this findloc bug fixed in 2023.1 ?

I cannot seem to find that level of detail in the Release Notes.

Barbara_P_Intel · ‎04-20-2023

Sorry, this is not fixed in the current releases of ifx 2023.2.0 and ifort 2021.9.0.

wtstephens · ‎04-01-2024

@Ron_Green wrote:
Bug ID is CMPLRLIBS-34373

Did this bug get fixed over the past months?

Ron_Green · ‎04-01-2024

Yes, the fix is in the 2024 Update 1 packages: 2024.1.0 ifx and ifort 2021.12.0

Steve_Lionel · ‎03-07-2023

The long wait is due to the way the debugger does a step-over operation. It executes one instruction, looks to see if the current instruction is in a different statement, if not, repeats. For array operations that can be tens of thousands of instructions per line, this can take a long time. I will often set a breakpoint at the next line and say Go.

wtstephens · ‎03-08-2023

One instruction at a time for step-over -- well, that explains it. Thanks Steve!

How could I have used the debugger for so long (in the past) and not understood that?

I had assumed the issue was probably boundary checking.

Anyhow, I tried Ron Green's timer, and found that the dopey loop is more than twice as fast as the findloc code -- which requires the addition of a bools = data>0 type of line in my code to initialize the logical array first -- so that is likely why.

I keep thinking that the findloc "should" take a lamba, rather than a "value", as its second parameter -- it's the Go and Scala occupying my brain! Ha ha!

wtstephens · ‎03-08-2023

If Intel Fortran is looking for things to do (yeah, right) then a sort of "micro lambda" in findloc would be cool -- to look for a conditional.

!              i.e. "_ OPERATOR value"
i = FINDLOC( array, "_ > 0" )
i = FINDLOC( array, "_.NE.-1" )

In Scala the "_" underscore is an inferred variable used in lambda functions.

Steve_Lionel · ‎03-08-2023

If it's not in the standard, that is VERY unlikely to happen.

wtstephens · ‎03-09-2023

Maybe so, but the more that I look into these MASK arrays, the more wildly efficient this type of improvement seems.