Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

reverse lookup

alexismor
Beginner
419 Views
Hello,

I have a lookup table which is full of unique integers (let's call them labels), like so:

A=(12,9874,382,21,1,1003,...)

Further on in my program, for example, I need to know which entry in A the number 382 corresponds to (ie, the 3rd). I need to do this millions of times, and my table A can be quite large (millions of entries), so I'd like to avoid searching as much as possible. Ideally, I would like to have a function where I input one of the labels, and get it's position in A like so:

f(n) -> position in A

If I were to hard code such a function I'd have something like:

integer function f(n)
integer :: n
select case(n)
case(12)
f=1
case(9874)
f=2
case(382)
f=3
...
end select
end function f

Is there any way to automatically generate such a function?

I tried making a reverse lookup table:

integer, dimensions(1: maximum label) :: reverse
do i=1,size(A)
reverse(A(i))=i
enddo

however the problem is that I quickly run out of memory since the maximum label value can get quite high. Is there any fancy trick that I can use in which memory is only used for the size(A) elements that I've assigned in the reverse table? Can I somehow "explode" my reverse table into little integer variable bits called r_12, r_9874, r_382 where

r_12=1
r_9874=2
r_382=3
...

an so on? I know that this sounds insane. I'm fairly new to fortran 90 and so I don't always know if something is possible or not and always end up spending lots of time pursuing useless avenues. That's why I keep asking these questions here!

Thanks,

Alexis
0 Kudos
5 Replies
ipattielgc
Beginner
419 Views
What you have here is a classic search/sort problem, and the world plus web is full of ways to handle this. However, if building the list as a binary tree, then searching is not possible, or fast enough, then maybe a hash table would be helpful. Search the web for something like "fortran hash table", and you will be on your way. Hash tables are easy to implement, and can be quite fast.

Message Edited by ipattielgc on 08-16-2005 06:27 PM

0 Kudos
Jugoslav_Dujic
Valued Contributor II
419 Views
I had an identical problem and I solved it in the following way: 1) I used Michel Olagnon's MRGRNK routine to get an index array, which will calculate the sort order of the original array (without modification of the array). Note, however, that it doubles memory usage (because the pivot array has to be the same dimensions as the original). 2) I used binary search over original array using the pivot. The routine is below:
!Binary search over ipivot array.

integer function ibsearch(idobj, idref, nref, ipivot)

integer, intent(in)::   idobj,         & !searched value
                        nref,          & !size of array
                        idref(nref),   & !original array
                        ipivot(nref)     !pivot array

integer::               ithis, nCurrPos, nPos2, nPos1

nPos1 =1
nPos2 = nref

ibsearch=0
do
   nCurrPos=(nPos1+nPos2)/2
   ithis=idref(ipivot(nCurrPos))
   if (ithis .EQ. idobj) then
      ibsearch=ipivot(nCurrPos)
      return
   else if (nPos2.LE.nPos1) then
      return
   else if (idobj.GT.ithis) then
      if (nPos1.LT.nCurrPos) then
         nPos1=nCurrPos
      else if (nPos1.EQ.nCurrPos) then
         nPos1=nCurrPos+1
         nCurrPos=nCurrPos+1
      end if
   else if (idobj.LT.ithis) then
      nPos2=nCurrPos
   end if
end do

end function ibsearch
I think it's about the fastest method one can get. HTH Jugoslav
0 Kudos
Intel_C_Intel
Employee
419 Views

Hello,

Quite original idea, Jugoslav. I think string nCurrPos = nCurrPos + 1 can be removed. We always must see disadvantages in any method and dont afraid them. In this one: memory usage doubled (but it isnt disadvantage because we will had to do something like this anyway, Im sure!) and what about if we want to add/remove some elements. Alexis didnt mentioned is it necessary to do this kind of operations often. But if it is time loosing will take place because of resorting ipivot array.

Hash tables requesting some experience and what about of number of segments in hash table. We dont know is Alex going to add/remove elements so number of elements may vary.

I would use balanced trees of search:

Red Black tree balanced binary tree. Time of search is O(log2n).

2-3 tree balanced binary-ternary tree. Time of search is greater then 1 + log3n and less then 1 + log2n.

Advantages of balanced that its quite easy to add/remove new element. And elements number may vary much but performance will decreasing like logarithm. Disadvantages is not maximum speed like in hash tables. Hash tables usefull when we create table at once and add/remove not much elements. Performance of hash table is O(N/B) where B number of segments in table, N number of elements. Very important that B is constant it is chosen when creating hash table. If you set B 100 but N will increase to 1,000,000 performance wouldnt be greate.

Unfortunately I dont have ready algorithms in Fortran for this kind of use cases. You may look for this in Inet or, for example in book Alfred V. Aho Data structures and algorithms. There are only algorithms in last link but they are very useful.

Stanislav

0 Kudos
Jugoslav_Dujic
Valued Contributor II
419 Views
You've pretty much said it all Stanislav. My method has the virtue of being simple & fast, but adding or removing elements is indeed a pain. Other methods you mention are more convenient for these operations, but I'm not familiar with them to comment on memory usage -- basically, it's a classic memory vs. speed tradeoff so I'd expect them to introduce at least N + B overhead and probably more.

Jugoslav
0 Kudos
jim_dempsey
Beginner
419 Views
The following works with positive keys.
It will work best (memory consumption wise) if key data tends to clump together. If key data is truely random and in the millions then memory savings may not be so good
Jim Dempsey
! Test2.f90
!
! FUNCTIONS:
! Test2 - Entry point of console application.
!
!****************************************************************************
!
! PROGRAM: Test2
!
! PURPOSE: Entry point for the console application.
!
!****************************************************************************
module FastFind

type zoneOOO
integer :: ooo(0:999)
end type zoneOOO

type zoneOOOPointer
type(zoneOOO), pointer :: aOOO
end type zoneOOOPointer

type zoneTTT
type(zoneOOOPointer) :: ttt(0:999)
end type zoneTTT

type zoneTTTPointer
type(zoneTTT), pointer :: aTTT
end type zoneTTTPointer

type zoneMMM
type(zoneTTTPointer) :: mmm(0:999)
end type zoneMMM

type zoneMMMPointer
type(zoneMMM), pointer :: aMMM
end type zoneMMMPointer

type(zoneMMMPointer) :: bbb(0:2)

integer :: LastIndex = 0

contains
function iFind(iCode)
iFind = 0
if(iCode .le. 0) return ! only accept positive keys
iOOO = mod(iCode, 1000)
iTTT = mod(iCode / 1000, 1000)
iMMM = mod(iCode / 1000000, 1000)
iBBB = iCode / 1000000000
if(.not. associated(bbb(iBBB).aMMM)) return
if(.not. associated(bbb(iBBB).aMMM.mmm(iMMM).aTTT)) return
if(.not. associated(bbb(iBBB).aMMM.mmm(iMMM).aTTT.ttt(iTTT).aOOO)) return
iFind = bbb(iBBB).aMMM.mmm(iMMM).aTTT.ttt(iTTT).aOOO.ooo(iOOO)
end function iFind
function insert(iCode)
if(iCode .le. 0) then
insert = 0
return ! only accept positive keys
endif
iOOO = mod(iCode, 1000)
iTTT = mod(iCode / 1000, 1000)
iMMM = mod(iCode / 1000000, 1000)
iBBB = iCode / 1000000000
if(.not. associated(bbb(iBBB).aMMM)) then
allocate(bbb(iBBB).aMMM)
do i=0,999
bbb(iBBB).aMMM.mmm(i).aTTT => null()
end do
endif
if(.not. associated(bbb(iBBB).aMMM.mmm(iMMM).aTTT)) then
allocate(bbb(iBBB).aMMM.mmm(iMMM).aTTT)
do i=0,999
bbb(iBBB).aMMM.mmm(iMMM).aTTT.ttt(i).aOOO => null()
end do
endif
if(.not. associated(bbb(iBBB).aMMM.mmm(iMMM).aTTT.ttt(iTTT).aOOO)) then
allocate(bbb(iBBB).aMMM.mmm(iMMM).aTTT.ttt(iTTT).aOOO)
do i=0,999
bbb(iBBB).aMMM.mmm(iMMM).aTTT.ttt(iTTT).aOOO.ooo(i) = 0
end do
endif
insert = bbb(iBBB).aMMM.mmm(iMMM).aTTT.ttt(iTTT).aOOO.ooo(iOOO)
if(insert .eq. 0) then
LastIndex = LastIndex + 1
insert = LastIndex
bbb(iBBB).aMMM.mmm(iMMM).aTTT.ttt(iTTT).aOOO.ooo(iOOO) = LastIndex
endif
end function insert
end module FastFind
program Test2
use FastFind
i = insert(12)
write(*,*) i
i = insert(9874)
write(*,*) i
i = insert(382)
write(*,*) i
i = insert(21)
write(*,*) i
i = insert(1)
write(*,*) i
i = insert(1003)
write( *,*) i
i = insert(-12345)
write(*,*) i

i = iFind(12)
write(*,*) i
i = iFind(9874)
write(*,*) i
i = iFind(382)
write(*,*) i
i = iFind(21)
write(*,*) i
i = iFind(1)
write(*,*) i
i = iFind(1003)
write(*,*) i
i = iFind(-12345)
write(*,*) i
i = iFind(9999) ! should not find
write(*,*) i
end program Test2
0 Kudos
Reply