You could search for the j

Amlesh_K_ · ‎03-05-2016

Hey,

I am trying to implement a lookup table based on binary search of the incoming element, and if the element doesn't exist, then pick the nearest element.

The problem I am facing is that the loop in which these lookup table computations are performed doesn't get vectorized because of the binary search implementation (binary search contains do while loop where loop count is not fixed, and hence that while loop is not vectorized. Even if I convert it to a do loop, there are exit statements which disable vectorization.)

Is it even possible to vectorize these loops? Maybe some other better search technique which can enable vectorization?

I am attaching the code as well as the opt-report. Compiler used is ifort.

Thanks.
Amlesh

TimP · ‎03-05-2016

It's almost trivial to change do while to a counted DO loop using the original values of index_min and index_max. For a couple of years, ifort has had the ability to vectorize a linear search with a single simple exit from a simple counted DO (one step beyond the rule the vectorizer quotes about not accepting multiple exits). It may be possible to perform binary search to narrow the limits to where the vectorized linear search is faster than continued binary search.

jimdempseyatthecove · ‎03-05-2016

Vectorizing a binary search, assuming keys are sorted, is counterproductive, due to the fact that "vectorization" in this sense would require the use of a vector gather instruction. IOW read a vector worth of elements distributed within the array of keys. This results in vector width number of reads (2, 4, 8, 16), followed by 1 compare against vector, then locating the split point within the vector. Whereas the scalar binary search performs 1 scalar read followed by 1 compare against scalar, then locating the split point above or below the split point. In this case, the scalar code will be approximately vector width x faster, possibly more (because you will not need the code to find the split point within the vector).

Vectorization can be done in a convoluted way, at the expense of memory, and in situations where searches overwhelm the number of insertions.

What you do is take your sorted key table and reorganize it as a vector. Assume for example you have a vector width of 8 elements.

The first 1/8th of the key table is written into locations 0, 8, 16, ...
The second 1/8th into 1, 9, 17, ...
The 8th 1/8th into 7, 15, 23, ...

Next produce 8 sub tables, rearranged the same way, and produced from each of the above 8 tables.

And repeat this for each sub table.

The total extra memory (assuming original linear table preserved), and for 8 lanes in vector, is approximately (bit-width of key) / 3. One million keys is ~2**20 and the resultant vector tables would approximate 7x that of the original table...
But the advantage is the key can be found in 7-8 reads as opposed to 20 reads.

Jim Dempsey

Amlesh_K_ · ‎03-07-2016

Hi Tim,

I wrote a simple linear search with a single comparison statement followed by an exit statement. The compiler is still not able to vectorize it.

A binary search with fixed iterations to narrow down the limits for vectorized linear search is feasible only when their is a vectorized linear search. I am posting the code snippet for the above mentioned linear search.

 do i = 1,1000000

   do j = 1,100

      if((input_arr(i) == lookup_table%input_table(j)) .or. (input_arr(i) > lookup_table%input_table(j) .and. input_arr(i) < lookup_table%input_table(j+1))) exit

   end do

     final_Outputs(i) = lookup_table%output_table(j)

  end do

Please suggest any changes that may help vectorize it.

Thanks.

TimP · ‎03-07-2016

It was quite an effort to paste and edit stuff to where I could get the compiler to run, and I get:

LOOP BEGIN at C:\users\tim\tim\tim\src\net\ak.f90(24,21)
remark #15536: loop was not vectorized: inner loop throttling prevents vector
ization of this outer loop. Refer to inner loop message for more details.

   LOOP BEGIN at C:\users\tim\tim\tim\src\net\ak.f90(16,60)
      remark #15523: loop was not vectorized: loop control variable J was found,
but loop iteration count cannot be computed before executing the loop
   LOOP END
LOOP END

I suppose the reference to multiple array elements depending on j and j+1 may be a problem for optimization. Evidently, this is not simple enough to pass under the compiler's limbo stick. The message about " iteration count cannot be computed" seems to be a bug in the compiler's diagnosis.

It's able to vectorize when I cut it down to just the first part of your compound conditional. By trying the part with j and j+1 references I get the message

remark #15524: loop was not vectorized: search loop cannot be vectorized unless all memory references can be aligned vector load.

which seems to verify that the compiler wants to shift the loop so that the initial array references are aligned, but can't do so on account of referring to 2 consecutive elements.

If you are trying to allow for the table being sorted in either ascending or descending order, I guess you will need to test which case it is and choose a simpler loop accordingly.

Amlesh_K_ · ‎03-07-2016

Hi Jim,

I don't properly understand what you're suggesting.

Actually I forgot to mention that in the application I am working on, the keys won't be sorted, the code I have posted is a test code before integrating it in our actual application. The keys would come in any random order. So, rearranging the keys may not be useful.

Thanks

Amlesh_K_ · ‎03-07-2016

Hi Tim,

Apologies for not pasting the full code last time.

Yeah, just for the first part of the compound conditional statement, loop is getting vectorized. But the thing is, in case the key value is not present in the table, I need the values which are on the left and right of the key value which will allow me to perform some interpolation.

Even if I try just the following - (here I'll atleast know the value to the right of the number, and can easily calculate the left value later)

if( input_arr(i) == lookup_table%input_table(j) .or. (input_arr(i) < lookup_table%input_table(j))) exit

I get the same following message as in the very first case (even if no j+1).

{

LOOP BEGIN at linearSearchImplementation.f90(83,3)
remark #15536: loop was not vectorized: inner loop throttling prevents vectorization of this outer loop. Refer to inner loop message for more details.

   LOOP BEGIN at linearSearchImplementation.f90(71,53)
      remark #15523: loop was not vectorized: cannot compute loop iteration count before executing the loop.
   LOOP END
LOOP END

}

What do you suggest?

TimP · ‎03-07-2016

I don't see how you could do a binary search if your table isn't sorted. Still, you could vectorize the equivalent of the loop by searching the conditions separately, or possibly by making an array lookup_table%input - input_array(i) and searching for the desired value there.

Amlesh_K_ · ‎03-07-2016

Hi Tim,

The lookup_table%input_table is a sorted table.

TimP · ‎03-08-2016

You could search for the j value inside the loop and check the j+1 condition after exit. I've wanted to use an if..then..exit..end if block but the compiler doesn't count it as a single exit.

jimdempseyatthecove · ‎03-08-2016

An alternative is to use a skip table. Taking your program, adding a skip table and varying the skip table size yields:

Skip table size 8
 Average Number Of Probes    19.95140     time   0.125806761439890 (original)
 Average Number Of Probes    17.00000     time   0.101816151291132 (skip)

Skip table size 16
 Average Number Of Probes    19.95140     time   0.125341766513884 (original)
 Average Number Of Probes    16.00000     time   9.339010948315263E-002 (skip)

Skip table size 32
 Average Number Of Probes    19.95140     time   0.129878542851657 (original)
 Average Number Of Probes    15.00000     time   8.756401576101780E-002 (skip)

Skip table size 64
 Average Number Of Probes    19.95140     time   0.124911293853074 (original)
 Average Number Of Probes    14.00000     time   8.084575645625591E-002 (skip)

Skip table size 128
 Average Number Of Probes    19.95140     time   0.123953686561435 (original)
 Average Number Of Probes    13.22490     time   7.466363674029708E-002 (skip)

The technique used was to take skip table size number of probes into key table, convert from real(8) to real(4) and place into the skip table. The skip table was linearly searched, though at a certain size, it would be beneficial to skip table the skip table.

program fifthDraftBinary
use omp_lib ! for omp_get_wtime()
integer, parameter :: r8 = selected_real_kind(12)
integer, parameter :: iTableSize = 1000000

type lookupTable                                             ! Lookup table containing the input and output values for numbers in a given range
real(kind=8) :: input_table(iTableSize)
real(kind=8) :: output_table(iTableSize)
end type lookupTable
real(kind=8) :: input_arr(iTableSize)                             ! Array containing the test cases
integer :: i,j,k
real(kind=8) :: range_max = 2.13001463673842_r8              ! Maximum possible value in the test case
real(kind=8) :: range_min = 1.00558425189593_r8              ! Minimum possible value in the test case
real(kind=8) :: interval, input_interval                     ! Calculate the values of the intervals for the lookup table elements and input array elements
integer :: index_min, index_max, index_mid                   ! Index values for binary search
real(kind=8) :: key                                          ! Temporarily storing the key
type(lookupTable) :: lookup_table
real(kind=8) :: final_Outputs(iTableSize)                       ! Output values based on the test case inputs
real(kind=8) :: random_mat(iTableSize)                          ! Value using intrinsic logarithms, make sure that the computations are actually done
integer :: numberOfProbes, totalNumberOfProbes
integer, parameter :: vecLanes = 128
integer, parameter :: SkipStride = iTableSize / (vecLanes + 1)
real(kind=4) :: SkipTable(vecLanes), SkipKey
!dir$ attributes align : 64 :: SkipTable
real(kind=8) :: startTime, endTime, elapseTime

  interval = (range_max - range_min)/10000.0_r8
  input_interval = (range_max - range_min)/1000000.0_r8

 !            Lookup Table Creation             !

  lookup_table%input_table(1) = range_min
  lookup_table%output_table(1) = log(range_min)

  do i = 2,iTableSize-1

     lookup_table%input_table(i) = lookup_table%input_table(i-1) + interval
     lookup_table%output_table(i) = log(lookup_table%input_table(i))

  end do

  lookup_table%input_table(iTableSize) = range_max
  lookup_table%output_table(iTableSize) = log(range_max)

 !         Lookup Table Creation End         !

 ! build SkipTable
 do i = 1,vecLanes
      k = (i-1)*SkipStride+1
      SkipTable(i) = lookup_table%input_table(k)
 end do

 !     Input Array Creation       !

  input_arr(1) = range_min

  do i = 2,iTableSize-1

    input_arr(i) = input_arr(i-1) + input_interval

  end do

  input_arr(iTableSize) = range_max

 !      Input Array Creation Finished      !


 !        Do binary search over input values and then a linear interpolation           !
 startTime = omp_get_wtime()
 totalNumberOfProbes = 0
 do i = 1,iTableSize

   index_max = iTableSize
   index_min = 1
   key = input_arr(i)
   numberOfProbes = 0

  do while(index_min <= index_max)
   numberOfProbes = numberOfProbes + 1 
   index_mid = (index_max + index_min)/2
   if(lookup_table%input_table(index_mid) == key) then
      final_Outputs(i) = lookup_table%output_table(index_mid)
      exit
   else if(lookup_table%input_table(index_mid) > key) then
      index_max = index_mid - 1
   else
      index_min = index_mid + 1
   end if
  end do
  totalNumberOfProbes = totalNumberOfProbes + numberOfProbes
      final_Outputs(i) = lookup_table%output_table(index_mid)

  end do
  endTime = omp_get_wtime()
  elapseTime = endTime - startTime
  print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime
 !          Approximations end          !


  random_mat = final_Outputs
 
 startTime = omp_get_wtime()
 totalNumberOfProbes = 0
 do i = 1,iTableSize
   key = input_arr(i)
   SkipKey = key
   numberOfProbes = 0
   index_min = 1
   do j=1,vecLanes
        if(SkipTable(j) > SkipKey) exit
        index_min = index_min + SkipStride
        numberOfProbes = numberOfProbes + 1 
   end do
   index_max = min(index_min + SkipStride, iTableSize)

   do while(index_min <= index_max)
     numberOfProbes = numberOfProbes + 1 
     index_mid = (index_max + index_min)/2
     if(lookup_table%input_table(index_mid) == key) then
       final_Outputs(i) = lookup_table%output_table(index_mid)
       exit
     else if(lookup_table%input_table(index_mid) > key) then
       index_max = index_mid - 1
     else
       index_min = index_mid + 1
     end if
   end do
   totalNumberOfProbes = totalNumberOfProbes + numberOfProbes
      final_Outputs(i) = lookup_table%output_table(index_mid)

 end do
 endTime = omp_get_wtime()
 elapseTime = endTime - startTime
 print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime
  
end program fifthDraftBinary

Jim Dempsey

jimdempseyatthecove · ‎03-08-2016

Oops,

saw omission.

Change line 49 to

k = i * SkipStride

The 128 SkipTable size yields:

Average Number Of Probes 19.95140 time 0.124189035966992
Average Number Of Probes 13.16830 time 7.945077167823911E-002

Note, with the correction and this run, it is slightly slower.

Jim Dempsey

Amlesh_K_ · ‎03-09-2016

Hi Jim,

Thanks.

Yeah the performance is definitely improving after adding this skip table, but still the original problem still persists, ie, the outer loop containing these searches (ie on lines 71 and 104) are yet not vectorized. I'll try integrating the code with this skip table in the application I am working on and check the performance.

jimdempseyatthecove · ‎03-10-2016

The outer loop, in the above code, cannot be vectorized, as this would require the inner loops (now 2 loops) to be vectorized in consistency with the vectorization of the outer loop (difficult under the best of circumstances). This may be the case that you do not fully understand the requirements to have candidate code for vectorization.

While mathematically a vector can be construed as a list of things, from the CPU perspective a vector is a "small vector" that can be placed into a vector register, and to carry further, is a list of such small vectors (preferably where a small vector does not cross a cache line). Assuming the preconditions are met, the CPU processes small vectors as: Single Instruction Multiple Data. Meaning the same operation is to be performed on all elements of the small vector. Later SIMD instruction sets permit the instruction to use a mask, thus permitting the operation to be performed on selected elements of the small vector. The CPU is incapable of processing small vectors as MIMD (Multiple Instruction Multiple Data) without the use of threading.

In a non-productive way, such as to satisfy a challenge as to "this cannot be vectorized", and not for the purpose of getting the best performing code, the later vector instructions include scatter/gather vector instructions. With this vector instruction, you could construct a set of indicies at intervals of SkipStride, then issue the vector instruction gather, then test all cells in the vector against the key. On AVX (256-bit), the gather instruction (if available) would cost you 4 memory (or L3, or L2, or L1) memory fetches. While you would be performing 4 compares in your key test (yes this part is faster), a memory fetch from RAM could be well over 200x that of the compare, from L3 this would be 20x-40x. The binary search would cost you 2 or 3 memory fetches. IOW the non-vector verses vector gather uses half to three fourths the memory/cache bandwidth of that of a vector gather.

The reason the skip table works, is should the skip table have used REAL(8), the same four entry fetch is performed in one fetch from RAM. But since I compressed the lookup values to REAL(4), which is suitable for the skip table two four entry fetches is performed in one fetch from RAM. *** Note, you will/may have to test the code to handle the case where the compressed table has duplicate entries, and/or where the lesser precision number does not undershoot the range of the larger precision number. The precision issue can be handled by adding TINY(0.0) to the value inserted into the skip table (but this will not fix the duplicates issue).

A second reason why the skip table works, is during a period of a high number of searches, the skip table entries will tend to locate in L1 cache.

Keep in mind that the skip table appears to become less effective as it grew to 128 entries. You may have to experiment to find the point of diminishing returns. My guess is it may not be productive to multi-level the skip table for a 1M entry table, 1G entry might be effective.

With all the above said, depending on the range and distribution of numbers, you might be able to reduce the skip table to 2 bytes or 1 byte per entry. Knowing:

real(kind=8) :: range_max = 2.13001463673842_r8              ! Maximum possible value in the test case 
real(kind=8) :: range_min = 1.00558425189593_r8              ! Minimum possible value in the test case

An array of INTEGER(2) holding something like:

real(kind=8) :: range_span ! range_max - range_min
real(kind=8) :: range_convert ! HUGE(SkipTable(1)) / range_span
 ...
range_span = range_max - range_min
range_convert = real(HUGE(SkipTable(1))) / range_span
...
SkipTable(i) = (lookup_table%input_table(k) - range_min) * range_convert

And you would have to modify the SkipKey in the same manner for the skip table search.

Jim Dempsey

jimdempseyatthecove · ‎03-15-2016

Here is an actual implementation of the Index Table as described above. First the results:

 Allocate arrays to    100000000
 Initialize Binary Search Lookup Table
 Initialize Skip Table (Pre-Binary Search helper)
 Initialize Index Table (Pre-Binary Search helper)
 initialize Input Array (key list for search test)
 Binary Search
 Average Number Of Probes    26.65782     time    17.1107615227811
 Skip Table with Binary Search
 Average Number Of Probes    19.87214     time    11.9470917559229
 Index Table with Binary Search
 Average Number Of Probes    11.99948     time    7.49961695726961

Then the code:

module globals
    integer, parameter :: r8 = selected_real_kind(12)
    integer, parameter :: iTableSize = 100000000

    type lookupTable                                             ! Lookup table containing the input and output values for numbers in a given range
        real(kind=8), allocatable :: input_table(:)
        real(kind=8), allocatable :: output_table(:)
    end type lookupTable
    real(kind=8), allocatable :: input_arr(:)                             ! Array containing the test cases
    real(kind=8) :: range_max = 2.13001463673842_r8              ! Maximum possible value in the test case
    real(kind=8) :: range_min = 1.00558425189593_r8              ! Minimum possible value in the test case
    real(kind=8) :: interval, input_interval                     ! Calculate the values of the intervals for the lookup table elements and input array elements
    integer :: index_min, index_max, index_mid                   ! Index values for binary search
    real(kind=8) :: key                                          ! Temporarily storing the key
    type(lookupTable) :: lookup_table
    real(kind=8), allocatable :: final_Outputs(:)                       ! Output values based on the test case inputs
    real(kind=8), allocatable :: random_mat(:)                          ! Value using intrinsic logarithms, make sure that the computations are actually done
    integer(kind=8) :: numberOfProbes, totalNumberOfProbes
    integer, parameter :: vecLanes = 128
    integer, parameter :: SkipStride = iTableSize / (vecLanes + 1)
    real(kind=4) :: SkipTable(vecLanes), SkipKey
    !dir$ attributes align : 64 :: SkipTable
    real(kind=8) :: startTime, endTime, elapseTime
    integer, parameter :: IndexTableSize = 65536
    type IndexTable_t
        integer :: index_min, index_max
    end type IndexTable_t
    type(IndexTable_t) :: IndexTable(IndexTableSize)
    real(kind=8) :: IndexTableDelta, IndexTableNextCut
    end module globals

subroutine allocate_arrays()
    use globals
    implicit none
    print *,"Allocate arrays to ", iTableSize
    allocate(lookup_table%input_table(iTableSize), lookup_table%output_table(iTableSize), input_arr(iTableSize))
    allocate(final_Outputs(iTableSize), random_mat(iTableSize))
end subroutine allocate_arrays
    
subroutine init_lookup_table()
    use globals
    implicit none
    integer :: i
    print *, "Initialize Binary Search Lookup Table"
    interval = (range_max - range_min)/10000.0_r8
    input_interval = (range_max - range_min)/1000000.0_r8

    !            Lookup Table Creation             !

    lookup_table%input_table(1) = range_min
    lookup_table%output_table(1) = log(range_min)

    do i = 2,iTableSize-1

        lookup_table%input_table(i) = lookup_table%input_table(i-1) + interval
        lookup_table%output_table(i) = log(lookup_table%input_table(i))

    end do

    lookup_table%input_table(iTableSize) = range_max
    lookup_table%output_table(iTableSize) = log(range_max)
end subroutine init_lookup_table

subroutine init_skip_table()
    use globals
    implicit none
    integer :: i, k
    print *,"Initialize Skip Table (Pre-Binary Search helper)"
    do i = 1,vecLanes
        k = i*SkipStride
        SkipTable(i) = lookup_table%input_table(k)
    end do
end subroutine init_skip_table

subroutine init_IndexTable()
    use globals
    implicit none
    integer :: i, j
    real(kind=8) :: the_max, the_min
    print *,"Initialize Index Table (Pre-Binary Search helper)"
    the_min = minval(lookup_table%input_table)
    the_max = maxval(lookup_table%input_table)
!   IndexTableDelta = (range_max - range_min) / (IndexTableSize + 1)
    IndexTableDelta = (the_max - the_min) / (IndexTableSize + 1)
!   IndexTableNextCut = range_min
    IndexTableNextCut = the_min
    index_min = 1
    index_max = 1
    do i=1,IndexTableSize
        IndexTableNextCut = IndexTableNextCut + IndexTableDelta
        if(index_max <= iTableSize) then
            do
                if(lookup_table%input_table(index_max) > IndexTableNextCut) then
                    indexTable(i)%index_min = index_min
                    IndexTable(i)%index_max = index_max
                    index_min = index_max
                    index_max = index_max+1
                    exit
                endif
                index_max = index_max+1
                if(index_max > iTableSize) then
                    indexTable(i)%index_min = index_min
                    IndexTable(i)%index_max = iTableSize
                    exit
                endif
            end do
        else
            IndexTable(i)%index_min = iTableSize
            IndexTable(i)%index_max = iTableSize
        endif
    end do
end subroutine init_IndexTable
    
subroutine init_input_array()
    use globals
    implicit none
    integer :: i
    print *,"initialize Input Array (key list for search test)"
    input_arr(1) = range_min
    do i = 2,iTableSize-1
        input_arr(i) = input_arr(i-1) + input_interval
    end do
    input_arr(iTableSize) = range_max
end subroutine init_input_array
    
subroutine TestBinarySearch()
    use globals
    use omp_lib
    !        Do binary search over input values and then a linear interpolation           !
    print *,"Binary Search"
    startTime = omp_get_wtime()
    totalNumberOfProbes = 0
    do i = 1,iTableSize
        index_max = iTableSize
        index_min = 1
        key = input_arr(i)
        numberOfProbes = 0
        do while(index_min <= index_max)
            numberOfProbes = numberOfProbes + 1 
            index_mid = (index_max + index_min)/2
            if(lookup_table%input_table(index_mid) == key) then
                final_Outputs(i) = lookup_table%output_table(index_mid)
                exit
            else if(lookup_table%input_table(index_mid) > key) then
                index_max = index_mid - 1
            else
                index_min = index_mid + 1
            end if
        end do
        totalNumberOfProbes = totalNumberOfProbes + numberOfProbes
        final_Outputs(i) = lookup_table%output_table(index_mid)
    end do
    endTime = omp_get_wtime()
    elapseTime = endTime - startTime
    print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime
    random_mat = final_Outputs
end subroutine TestBinarySearch

subroutine SkipTableTestBinarySearch()
    use globals
    use omp_lib
    implicit none
    integer :: i,j
    print *,"Skip Table with Binary Search"
    startTime = omp_get_wtime()
    totalNumberOfProbes = 0
    do i = 1,iTableSize
        numberOfProbes = 0
        key = input_arr(i)
        ! use SkipTable to reduce the number of binary search memory probes
        SkipKey = key
        index_min = 1
        do j=1,vecLanes
            if(SkipTable(j) > SkipKey) exit
            index_min = index_min + SkipStride
            numberOfProbes = numberOfProbes + 1 
        end do
        index_max = min(index_min + SkipStride, iTableSize)
        ! now binary search reduced range
        do while(index_min <= index_max)
            numberOfProbes = numberOfProbes + 1 
            index_mid = (index_max + index_min)/2
            if(lookup_table%input_table(index_mid) == key) then
                final_Outputs(i) = lookup_table%output_table(index_mid)
                exit
            else if(lookup_table%input_table(index_mid) > key) then
                index_max = index_mid - 1
            else
                index_min = index_mid + 1
            end if
        end do
        totalNumberOfProbes = totalNumberOfProbes + numberOfProbes
        final_Outputs(i) = lookup_table%output_table(index_mid)
    end do
    endTime = omp_get_wtime()
    elapseTime = endTime - startTime
    print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime
    random_mat = final_Outputs
end subroutine SkipTableTestBinarySearch

subroutine IndexTableBinarySearch()
    use globals
    use omp_lib
    implicit none
    integer :: i,j,k
    print *,"Index Table with Binary Search"
    startTime = omp_get_wtime()
    totalNumberOfProbes = 0
    do i = 1,iTableSize
        numberOfProbes = 0
        key = input_arr(i)
        ! use Index Table to reduce range of Binary Search
        j = min(max(1,int((key - range_min) / IndexTableDelta)),IndexTableSize)
        numberOfProbes = numberOfProbes + 1 
        index_min = IndexTable(j)%index_min
        index_max = IndexTable(j)%index_max
        j = 0
        ! now perform binary search on remaining range
        do while(index_min <= index_max)
            numberOfProbes = numberOfProbes + 1 
            index_mid = (index_max + index_min)/2
            if(lookup_table%input_table(index_mid) == key) then
                final_Outputs(i) = lookup_table%output_table(index_mid)
                exit
            else if(lookup_table%input_table(index_mid) > key) then
                index_max = index_mid - 1
            else
                index_min = index_mid + 1
            end if
        end do
        totalNumberOfProbes = totalNumberOfProbes + numberOfProbes
        final_Outputs(i) = lookup_table%output_table(index_mid)
    end do
    endTime = omp_get_wtime()
    elapseTime = endTime - startTime
    print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime
    random_mat = final_Outputs
end subroutine IndexTableBinarySearch
    
program fifthDraftBinary
    use omp_lib ! for omp_get_wtime()
    use globals
    implicit none
    integer :: i,j,k
    ! Allocate working arrays
    call allocate_arrays()
    
    ! Initialize tables used by different algorithms
    ! Assumed to occur infrequently (when database changes)

    ! Main database and binary search table
    call init_lookup_table()
    
    ! Modified binary search with skip table
    call init_skip_table()
    
    ! Modified binary search with index table
    call init_IndexTable()

    ! initialize the keys to search for in the various tests
    call init_input_array()

    ! Traditional Binary Search
    call TestBinarySearch()
    
    ! SkipTable Binary Search
    call SkipTableTestBinarySearch()
    
    ! IndexTable Binary Search
    call IndexTableBinarySearch()

    end program fifthDraftBinary

You can experiment with adjusting the table sizes. Doubling the Index Table size took one probe off.

Jim Dempsey

Amlesh_K_ · ‎03-15-2016

Hi Jim,

Thanks a lot for all the helpful answers and code. I have integrated your skip table based code. (I've to experiment with various sizes of skipStride). I get the point that vectorization is not possible with binary search.

I will work with the above code also since the improvement seems to be much more than traditional binary search, and even skip table based search.

Thanks,
Amlesh

jimdempseyatthecove · ‎03-15-2016

Amlesh, et al

The Skip Table should be insensitive to the distribution of the keys within the database. Meaning it is not effected by clustering of key values. Whereas the Index table works best with relatively even distribution of keys. When you construct the index table, it may be advantageous to size the table such that each entry presents a range that tends to reside within a page, and thus reducing the number of TLB loads while performing the binary search. This would require some experimentation to confirm this assumption.

Side note:

The index table could possibly be condensed into one entry, but this would not provide for the means to distinguish between a node using the same index for index_min and index_max, or using {index-1,index} where the index_min is assumed to be at least 1 less than index_max (with exception for index==1).

Jim Dempsey

Binary Search and Other Alternatives