How to optimize this simple code

Nan_Deng · ‎04-02-2012

I have a simple code section that is called so many times and need to be optimized:

====================

do i=1,n
do j=1,n
.......
ia = int(f(i,j))
nk = g(i,j)
.......
do k=1,n
do L = 1, nk
if (val(L)== ia) then
loc =L
goto 100
endif
enddo
100 continue
enddo
enddo
enddo
====================

The code itself is simple enough but the black part is inside 3 level of do-loops. When n becomes large, this simple number comparison is called many many times. Is there a simple way to optimize this code, or is there a function that will perform the function (compare the value of the entries of a vector against a constant, and give the location where the values are the same) in a more optimized way?

Thanks for all suggestions in advance.

mecej4 · ‎04-03-2012

Your question cannot be answered without knowing

(i) the distribution of the locations in the array where the array element may have the specified value. The order of traversing the arrays should take advantage of this information.

(ii) if more than one array element may have the specified value, will the second item of any of these index pairs be suitable? If so, as soon as a match has been found the loops should be terminated.

(iii) since nothing in the section of code depends on k, what is the purpose of looping on k? Or do we have a casualty of the evisceration of the code to create a short example?

To achieve the same result as the code that you gave, one can search by running the i and j loops backwards, and exit both loops after the first match, since doing so will find the last match that would have been recorded when the loops were run forward. Here is a complete program that fills the arrays and more or less simulates such a reverse search. If you are more strict about the order, you need to replace 1:g(i,j) by g(i,j):1:-1, and make a suitable correction before accepting the value of loc

[fortran]implicit none integer, parameter :: IMAX=10, JMAX=5, KMAX=7 real, dimension(IMAX,JMAX) :: f integer, dimension(IMAX,JMAX) :: g integer, dimension(KMAX) :: val integer :: i,j,k,n,loc(1) n=IMAX do i=1,n do j=1,Jmax f(i,j) = 1000*i+j g(i,j) = 7 end do end do do i=1,KMAX val(i) = 7000+i end do ILOOP: do i=n,1,-1 do j=JMAX,1,-1 loc = minloc(val(1:g(i,j)),val(1:g(i,j))==int(f(i,j))) if(loc(1).gt.0)exit ILOOP enddo enddo ILOOP write(*,*)' loc = ',loc end [/fortran]

TimP · ‎04-03-2012

You may speed it up by removing the type conversion from the loop.
ifort is better than other compilers at vectorizing minloc (but not when /standard-semantics is set).
OpenMP parallelism is useful for speeding up long searches.

mecej4 · ‎04-03-2012

> You may speed it up by removing the type conversion from the loop.

Although the values in the real array f are integers in the example that I made up, that is not necessarily so in Nan Deng's application, so it may be necessary to retain the conversion from real to integer.

SergeyKostrov · ‎04-03-2012

I think a Lookup Table ( LUT )could be used instead of:

...
if( val( L ) == ia ) then
loc = L
...
...

but a question is how to pre-generate it? Withthe LUT you won't have the 'if-then' statements inside of 'do' loop.

Nan_Deng · ‎04-03-2012

First, thanks for all who responded to my post. I would like to clarify that (1) All entriesin the array val have different values, so one search will reach one match; (2) sorry for my forgotten declaritions, all variables here are integers. I just cut short to create a short code to demonstrate the key point: How to search entries in a vector for a match and then determine theaddressas fast as possible?

Les_Neilson · ‎04-03-2012

Is it possible, when val is populated, to create an index array at the same time,of the values in sorted order?
If so then you could use a binary searchusing the index array to find L.
i.e.
val_index(1) is the subscript into val for the smallest value in the array.
val_index(nk) is the subscript into val for the largest value in val.
val_index(mid) is the middle value
and do a binary chop till you find the match.

(roughly speaking something like this)
ilow = val_index(1)
ihigh = val_index(nk)
mid = val_index(nk/2) ! some convenient midpoint

If ia > val( mid ) then value lies in the top half of the table
ilow = mid
mid = (ihigh-ilow)/2
repeat search

and similar for ia < val(mid) only you reduce ihigh

Les

Nan_Deng · ‎04-03-2012

THanks for your long reply.I should first say I made a bad example, all elements in val(L) (and anything else here) are integers, and val() itself is anindex array for another complex array. Regarding your questions, (i) the distribution of the values inside array val is arbitrary; (ii) each element in val() has a unique value, so there can only have one match; (iii) The loop k is just for emphysize that this comparison-and-location finding is deeply buried in multiple layer ofloops. There are some codes in-between but are not directly related to the compare-and-find.

I don't quite fully understand your example yet. For the ILOOP, (1) Why we define loc(1) but use it the same way as a scalar,is this loc(1) same as loc? (2) for the definitions in MINLOC, should I add "MASK=" for the secondcomponent?(3) Sorry for my ignorance, but could you explain why use MINLOC, shouldn't a DO WHILE loop do the same thing?

mecej4 · ‎04-04-2012

> all elements in val() (and anything else here) are integers

In that case, it makes no sense to use the int() function in

ia = int(f(i,j)

If you do not understand the MINLOC and MINVAL functions, read about them in a Fortran reference manual.

> but could you explain why use MINLOC, shouldn't a DO WHILE loop do the
> same thing?

An optimizing compiler will usually generate more efficient code if intrinsic functions are used rather than when DO loops are used.

TimP · ‎04-04-2012

The syntax quoted is consistent with the Fortran 90 definition of MINLOC, where the only option is to return an array. If you prefer (as I do) the f95 option for MINLOC to return a scalar, that is done with the additional DIM=1 argument.
I've spent more time than you want to hear about testing the relative quality of optimization of MINLOC and equivalent DO loops. For clarity, the f2008 intrinsic FINDLOC seems more appropriate here, but there seems to be no schedule for implementation in ifort. Until then, if you expect the target element to be located close to the starting point of your search, I agree that DO WHILE should be faster. I prefer counted DO with EXIT; for one thing, it clearly limits the search range.
With the move toward wider parallel instruction sets (e.g. AVX), there is increased advantage for vectorizable intrinsics such as MINLOC (which has received a lot of attention from the Intel compiler team) and FINDLOC.
ifort has a bias against optimization of DO WHILE; it seems much more difficult to hit a match with the compiler's optimization patterns than with the other alternatives.

JLuna5 · ‎04-04-2012

isbetter to workforindependent functionsanddoesnotwearonecall toactionand the applicationismore rapidand effective