Playing around with findloc to help modernize some old code, and it seems that I cannot seem to understand the difference between using it with DIM and without. What am I missing?
program find integer :: data(5) = (/-3, 3, 2, 0, -2/) logical :: bools(5) integer :: i integer :: a(1) write(*,'(A,5I3)') 'Data: ',data write(*,*) write(*,*) '----- findloc with DIM -----' write(*,*) 'First positive index=',findloc(data>0, .TRUE., DIM=1) write(*,*) bools = data>0 write(*,*) 'Array of (data>0):',bools a = findloc(bools,.TRUE.) write(*,*) 'First positive index=',a write(*,*) write(*,*) '----- findloc without DIM -----' write(*,*) 'First positive index=',findloc(data>0, .TRUE.),' why?' a = findloc(data>0, .TRUE.) write(*,*) 'Array of index:',a write(*,*) endprogram
Has the output:
Data: -3 3 2 0 -2 ----- findloc with DIM ----- First positive index= 2 Array of (data>0): F T T F F First positive index= 2 ----- findloc without DIM ----- First positive index= 0 why? Array of index: 0
I cannot seem to get findloc without DIM to return a non-zero result! Maybe I should not have been working on the weekend???
Thanks! I added a couple of lines, and became suspicious about the compiler myself!
What I am looking for is a "one-line seek until" so that when I am debugging I can "step over" these types of loops -- there seem to be a lot of them in the old code that I am modernizing.
I am suspicious about the efficiency of that "data>0" array operation here.
How (in)efficient is that?
well for size 5 arrays it's efficient enough. If we look at the function call
findloc(data>0, .TRUE., DIM=1)
start with 'data>0'. The compiler generates code to create a logical array of the same shape as 'data'. This is a memory allocation for a temporary. In this case, rank 1 size 5. Pretty fast. By default on stack.
This is a very simple case. Small size, DIM=1.
IF 'data' is large, you are creating a temporary array. It's possible to exhaust stack. See -heap-arrays compiler option to allocate in heap instead of stack.
Also, 'data' is rank 1. What if 'data' is rank 2,3,4 etc? Sure, you say 'DIM=1' so you might assume, well, LOGICALLY findloc() should only create a rank 1 temp since we only care along dimension 1. Well compilers may not be that smart. It could create a temp of the same SHAPE as 'data' first, and then search along DIM=1. It really depends on who writes the code and how much time they have to look for optimizations like looking ahead to DIM, figuring out to flatten a temp to the shape of 'data' in that dimension, etc. Lots of IF conditions, looking outside local context, etc. And if you have hundreds of intrinsics to write and maintain, simplicity and correctness of code (not to mention maintainability) always wins. Context. Humans can spot such smart time savings, compilers may or may not.
IN GENERAL, good coding practice is to write your code so that it is understandable and easy to maintain over time. Your code could outlive us both. I like the style of the code you have in this example - better than a hand-coded loop. You are expressing your intent quite clearly here. Let the compiler do it's best with good, readable code. Unless you need every ounce of performance and you do not care about readability or long term maintenance costs - there are those amongst us in that camp.
...creating a temporary array...
...better than a hand-coded loop...
Thanks! In this case there are lots of 1 dimension arrays with a length of 15,000 elements. It feels like the hand-coded loop will be faster, unless I can store and re-use the logical array more globally -- which is a bigger refactoring than the current effort.
There are lots of "seek until" loops that I would like to collapse into one line each.
So the one-liners will look like this -- also in order to be one-step in the debugger.
do i=1,UBOUND(data,1); if( data(i)>0 ) exit; enddo write(*,*) 'First positive index=',i
Could that line be "optimum" even with debug turned fully on?
Wow! Stepping over that 1-line DO in my real code in the debugger took almost 2 minutes!!! That is unexpectedly wildly inefficient! Yikes!
The index stops at 8094 out of 15000 elements.
But if I "go" to a breakpoint just beyond the 1-line DO then it is a super quick 3 milliseconds.
While not exactly broken, that much inefficiency for the "Step Over" the debugger is a show stopper.
I cannot comment on the efficiency of the Microsoft debugger and it's Step Over function.
You should consider using a high-precision timer and simply time the 2 methods with your typical array size(s). Below is a simple precise timer module. You can reuse module 'timer' in your own code
module timer use ISO_FORTRAN_ENV implicit none integer, parameter :: sp = REAL32 integer, parameter :: dp = REAL64 contains ! -------------------------------------------------- ! mytime: returns the current wall clock time ! -------------------------------------------------- function mytime() result (tseconds) real (dp) :: tseconds integer (INT64) :: count, count_rate, count_max real (dp) :: tsec, rate CALL SYSTEM_CLOCK(count, count_rate, count_max) tsec = count rate = count_rate tseconds = tsec / rate end function mytime end module timer
and use it thusly
program foo use timer implicit none real (dp) :: tstart, tstop, ttime !... your code tstart = mytime() !...what you time tstop = mytime() !elasped time ttime = tstop - tstart write(*,*) 'total time: ', ttime end program foo
You should run this with your Release Configuration.
yes it is this way for years , it has to do checks on each element as it is processed. When I by accident step onto such a line rather than go for a coffee I 1] hit the "breakall" button, this will break at some system runtime line for which there is no source 2] set a break point in your actual code after the line of doom.... 3] hit f5 to continue and it will then get straight to the break point you just set without the long wait.
You aren't kidding! I put in the intrinsic findloc expecting it to be faster in the debugger than my loop, and it was still nearly 2 minutes on my 15000 element array!
What is it checking for each element??? And who is "it"? The debugger? What can it check during the call?
a = findloc(bools,.TRUE.)
The long wait is due to the way the debugger does a step-over operation. It executes one instruction, looks to see if the current instruction is in a different statement, if not, repeats. For array operations that can be tens of thousands of instructions per line, this can take a long time. I will often set a breakpoint at the next line and say Go.
One instruction at a time for step-over -- well, that explains it. Thanks Steve!
How could I have used the debugger for so long (in the past) and not understood that?
I had assumed the issue was probably boundary checking.
Anyhow, I tried Ron Green's timer, and found that the dopey loop is more than twice as fast as the findloc code -- which requires the addition of a bools = data>0 type of line in my code to initialize the logical array first -- so that is likely why.
I keep thinking that the findloc "should" take a lamba, rather than a "value", as its second parameter -- it's the Go and Scala occupying my brain! Ha ha!
If Intel Fortran is looking for things to do (yeah, right) then a sort of "micro lambda" in findloc would be cool -- to look for a conditional.
! i.e. "_ OPERATOR value" i = FINDLOC( array, "_ > 0" ) i = FINDLOC( array, "_.NE.-1" )
In Scala the "_" underscore is an inferred variable used in lambda functions.