Community
cancel
Showing results for
Did you mean:
Highlighted
Beginner
6 Views

## Binary Search and Other Alternatives

Hey,

I am trying to implement a lookup table based on binary search of the incoming element, and if the element doesn't exist, then pick the nearest element.

The problem I am facing is that the loop in which these lookup table computations are performed doesn't get vectorized because of the binary search implementation (binary search contains do while loop where loop count is not fixed, and hence that while loop is not vectorized. Even if I convert it to a do loop, there are exit statements which disable vectorization.)

Is it even possible to vectorize these loops? Maybe some other better search technique which can enable vectorization?

I am attaching the code as well as the opt-report. Compiler used is ifort.

Thanks.
Amlesh

Tags (1)
16 Replies
Highlighted
Black Belt
6 Views

## It's almost trivial to change

It's almost trivial to change do while to a counted DO loop using the original values of index_min and index_max.  For a couple of years, ifort has had the ability to vectorize a linear search with a single simple exit from a simple counted DO (one step beyond the rule the vectorizer quotes about not accepting multiple exits).  It may be possible to perform binary search to narrow the limits to where the vectorized linear search is faster than continued binary search.

Highlighted
Black Belt
6 Views

## Vectorizing a binary search,

Vectorizing a binary search, assuming keys are sorted, is counterproductive, due to the fact that "vectorization" in this sense would require the use of a vector gather instruction. IOW read a vector worth of elements distributed within the array of keys. This results in vector width number of reads (2, 4, 8, 16), followed by 1 compare against vector, then locating the split point within the vector. Whereas the scalar binary search performs 1 scalar read followed by 1 compare against scalar, then locating the split point above or below the split point. In this case, the scalar code will be approximately vector width x faster, possibly more (because you will not need the code to find the split point within the vector).

Vectorization can be done in a convoluted way, at the expense of memory, and in situations where searches overwhelm the number of insertions.

What you do is take your sorted key table and reorganize it as a vector. Assume for example you have a vector width of 8 elements.

The first 1/8th of the key table is written into locations 0, 8, 16, ...
The second 1/8th into 1, 9, 17, ...
The 8th 1/8th into 7, 15, 23, ...

Next produce 8 sub tables, rearranged the same way, and produced from each of the above 8 tables.

And repeat this for each sub table.

The total extra memory (assuming original linear table preserved), and for 8 lanes in vector, is approximately  (bit-width of key) / 3. One million keys is ~2**20 and the resultant vector tables would approximate 7x that of the original table...
But the advantage is the key can be found in 7-8 reads as opposed to 20 reads.

Jim Dempsey

Highlighted
Beginner
6 Views

## Hi Tim,

Hi Tim,

I wrote a simple linear search with a single comparison statement followed by an exit statement. The compiler is still not able to vectorize it.

A binary search with fixed iterations to narrow down the limits for vectorized linear search is feasible only when their is a vectorized linear search.  I am posting the code snippet for the above mentioned linear search.

``` do i = 1,1000000

do j = 1,100

if((input_arr(i) == lookup_table%input_table(j)) .or. (input_arr(i) > lookup_table%input_table(j) .and. input_arr(i) < lookup_table%input_table(j+1))) exit

end do

final_Outputs(i) = lookup_table%output_table(j)

end do```

Please suggest any changes that may help vectorize it.

Thanks.

Highlighted
Black Belt
6 Views

## It was quite an effort to

It was quite an effort to paste and edit stuff to where I could get the compiler to run, and I get:

LOOP BEGIN at C:\users\tim\tim\tim\src\net\ak.f90(24,21)
remark #15536: loop was not vectorized: inner loop throttling prevents vector
ization of this outer loop. Refer to inner loop message for more details.

LOOP BEGIN at C:\users\tim\tim\tim\src\net\ak.f90(16,60)
remark #15523: loop was not vectorized: loop control variable J was found,
but loop iteration count cannot be computed before executing the loop
LOOP END
LOOP END

I suppose the reference to multiple array elements depending on j and j+1 may be a problem for optimization.   Evidently, this is not simple enough to pass under the compiler's limbo stick.  The message about " iteration count cannot be computed" seems to be a bug in the compiler's diagnosis.

It's able to vectorize when I cut it down to just the first part of your compound conditional.  By trying the part with j and j+1 references I get the message

remark #15524: loop was not vectorized: search loop cannot be vectorized unless all memory references can be aligned vector load.

which seems to verify that the compiler wants to shift the loop so that the initial array references are aligned, but can't do so on account of referring to 2 consecutive elements.

If you are trying to allow for the table being sorted in either ascending or descending order, I guess you will need to test which case it is and choose a simpler loop accordingly.

Highlighted
Beginner
6 Views

## Hi Jim,

Hi Jim,

I don't properly understand what you're suggesting.

Actually I forgot to mention that in the application I am working on, the keys won't be sorted, the code I have posted is a test code before integrating it in our actual application. The keys would come in any random order. So, rearranging the keys may not be useful.

Thanks

Highlighted
Beginner
6 Views

## Hi Tim,

Hi Tim,

Apologies for not pasting the full code last time.

Yeah, just for the first part of the compound conditional statement, loop is getting vectorized. But the thing is, in case the key value is not present in the table, I need the values which are on the left and right of the key value which will allow me to perform some interpolation.

Even if I try just the following - (here I'll atleast know the value to the right of the number, and can easily calculate the left value later)

`if( input_arr(i) == lookup_table%input_table(j) .or. (input_arr(i) < lookup_table%input_table(j))) exit`

I get the same following message as in the very first case (even if no j+1).

{

LOOP BEGIN at linearSearchImplementation.f90(83,3)
remark #15536: loop was not vectorized: inner loop throttling prevents vectorization of this outer loop. Refer to inner loop message for more details.

LOOP BEGIN at linearSearchImplementation.f90(71,53)
remark #15523: loop was not vectorized: cannot compute loop iteration count before executing the loop.
LOOP END
LOOP END

}

What do you suggest?

Highlighted
Black Belt
6 Views

## I don't see how you could do

I don't see how you could do a binary search if your table isn't sorted.  Still, you could vectorize the equivalent of the loop by searching the conditions separately, or possibly by making an array lookup_table%input - input_array(i) and searching for the desired value there.

Highlighted
Beginner
6 Views

## Hi Tim,

Hi Tim,

The lookup_table%input_table is a sorted table.

Highlighted
Black Belt
6 Views

## You could search for the j

You could search for the j value inside the loop and check the j+1 condition after exit.  I've wanted to use an if..then..exit..end if block but the compiler doesn't count it as a single exit.

Highlighted
Black Belt
6 Views

## An alternative is to use a

An alternative is to use a skip table. Taking your program, adding a skip table and varying the skip table size yields:

```Skip table size 8
Average Number Of Probes    19.95140     time   0.125806761439890 (original)
Average Number Of Probes    17.00000     time   0.101816151291132 (skip)

Skip table size 16
Average Number Of Probes    19.95140     time   0.125341766513884 (original)
Average Number Of Probes    16.00000     time   9.339010948315263E-002 (skip)

Skip table size 32
Average Number Of Probes    19.95140     time   0.129878542851657 (original)
Average Number Of Probes    15.00000     time   8.756401576101780E-002 (skip)

Skip table size 64
Average Number Of Probes    19.95140     time   0.124911293853074 (original)
Average Number Of Probes    14.00000     time   8.084575645625591E-002 (skip)

Skip table size 128
Average Number Of Probes    19.95140     time   0.123953686561435 (original)
Average Number Of Probes    13.22490     time   7.466363674029708E-002 (skip)
```

The technique used was to take skip table size number of probes into key table, convert from real(8) to real(4) and place into the skip table. The skip table was linearly searched, though at a certain size, it would be beneficial to skip table the skip table.

```program fifthDraftBinary
use omp_lib ! for omp_get_wtime()
integer, parameter :: r8 = selected_real_kind(12)
integer, parameter :: iTableSize = 1000000

type lookupTable                                             ! Lookup table containing the input and output values for numbers in a given range
real(kind=8) :: input_table(iTableSize)
real(kind=8) :: output_table(iTableSize)
end type lookupTable
real(kind=8) :: input_arr(iTableSize)                             ! Array containing the test cases
integer :: i,j,k
real(kind=8) :: range_max = 2.13001463673842_r8              ! Maximum possible value in the test case
real(kind=8) :: range_min = 1.00558425189593_r8              ! Minimum possible value in the test case
real(kind=8) :: interval, input_interval                     ! Calculate the values of the intervals for the lookup table elements and input array elements
integer :: index_min, index_max, index_mid                   ! Index values for binary search
real(kind=8) :: key                                          ! Temporarily storing the key
type(lookupTable) :: lookup_table
real(kind=8) :: final_Outputs(iTableSize)                       ! Output values based on the test case inputs
real(kind=8) :: random_mat(iTableSize)                          ! Value using intrinsic logarithms, make sure that the computations are actually done
integer :: numberOfProbes, totalNumberOfProbes
integer, parameter :: vecLanes = 128
integer, parameter :: SkipStride = iTableSize / (vecLanes + 1)
real(kind=4) :: SkipTable(vecLanes), SkipKey
!dir\$ attributes align : 64 :: SkipTable
real(kind=8) :: startTime, endTime, elapseTime

interval = (range_max - range_min)/10000.0_r8
input_interval = (range_max - range_min)/1000000.0_r8

!            Lookup Table Creation             !

lookup_table%input_table(1) = range_min
lookup_table%output_table(1) = log(range_min)

do i = 2,iTableSize-1

lookup_table%input_table(i) = lookup_table%input_table(i-1) + interval
lookup_table%output_table(i) = log(lookup_table%input_table(i))

end do

lookup_table%input_table(iTableSize) = range_max
lookup_table%output_table(iTableSize) = log(range_max)

!         Lookup Table Creation End         !

! build SkipTable
do i = 1,vecLanes
k = (i-1)*SkipStride+1
SkipTable(i) = lookup_table%input_table(k)
end do

!     Input Array Creation       !

input_arr(1) = range_min

do i = 2,iTableSize-1

input_arr(i) = input_arr(i-1) + input_interval

end do

input_arr(iTableSize) = range_max

!      Input Array Creation Finished      !

!        Do binary search over input values and then a linear interpolation           !
startTime = omp_get_wtime()
totalNumberOfProbes = 0
do i = 1,iTableSize

index_max = iTableSize
index_min = 1
key = input_arr(i)
numberOfProbes = 0

do while(index_min <= index_max)
numberOfProbes = numberOfProbes + 1
index_mid = (index_max + index_min)/2
if(lookup_table%input_table(index_mid) == key) then
final_Outputs(i) = lookup_table%output_table(index_mid)
exit
else if(lookup_table%input_table(index_mid) > key) then
index_max = index_mid - 1
else
index_min = index_mid + 1
end if
end do
totalNumberOfProbes = totalNumberOfProbes + numberOfProbes
final_Outputs(i) = lookup_table%output_table(index_mid)

end do
endTime = omp_get_wtime()
elapseTime = endTime - startTime
print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime
!          Approximations end          !

random_mat = final_Outputs

startTime = omp_get_wtime()
totalNumberOfProbes = 0
do i = 1,iTableSize
key = input_arr(i)
SkipKey = key
numberOfProbes = 0
index_min = 1
do j=1,vecLanes
if(SkipTable(j) > SkipKey) exit
index_min = index_min + SkipStride
numberOfProbes = numberOfProbes + 1
end do
index_max = min(index_min + SkipStride, iTableSize)

do while(index_min <= index_max)
numberOfProbes = numberOfProbes + 1
index_mid = (index_max + index_min)/2
if(lookup_table%input_table(index_mid) == key) then
final_Outputs(i) = lookup_table%output_table(index_mid)
exit
else if(lookup_table%input_table(index_mid) > key) then
index_max = index_mid - 1
else
index_min = index_mid + 1
end if
end do
totalNumberOfProbes = totalNumberOfProbes + numberOfProbes
final_Outputs(i) = lookup_table%output_table(index_mid)

end do
endTime = omp_get_wtime()
elapseTime = endTime - startTime
print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime

end program fifthDraftBinary
```

Jim Dempsey

Highlighted
Black Belt
6 Views

## Oops,

Oops,

saw omission.

Change line 49 to

k = i * SkipStride

The 128 SkipTable size yields:

Average Number Of Probes    19.95140     time   0.124189035966992
Average Number Of Probes    13.16830     time   7.945077167823911E-002

Note, with the correction and this run, it is slightly slower.

Jim Dempsey

Highlighted
Beginner
6 Views

## Hi Jim,

Hi Jim,

Thanks.

Yeah the performance is definitely improving after adding this skip table, but still the original problem still persists, ie, the outer loop containing these searches (ie on lines 71  and 104) are yet not vectorized. I'll try integrating the code with this skip table in the application I am working on and check the performance.

Highlighted
Black Belt
6 Views

## The outer loop, in the above

The outer loop, in the above code, cannot be vectorized, as this would require the inner loops (now 2 loops) to be vectorized in consistency with the vectorization of the outer loop (difficult under the best of circumstances). This may be the case that you do not fully understand the requirements to have candidate code for vectorization.

While mathematically a vector can be construed as a list of things, from the CPU perspective a vector is a "small vector" that can be placed into a vector register, and to carry further, is a list of such small vectors (preferably where a small vector does not cross a cache line). Assuming the preconditions are met, the CPU processes small vectors as: Single Instruction Multiple Data. Meaning the same operation is to be performed on all elements of the small vector. Later SIMD instruction sets permit the instruction to use a mask, thus permitting the operation to be performed on selected elements of the small vector. The CPU is incapable of processing small vectors as MIMD (Multiple Instruction Multiple Data) without the use of threading.

In a non-productive way, such as to satisfy a challenge as to "this cannot be vectorized", and not for the purpose of getting the best performing code, the later vector instructions include scatter/gather vector instructions. With this vector instruction, you could construct a set of indicies at intervals of SkipStride, then issue the vector instruction gather, then test all cells in the vector against the key. On AVX (256-bit), the gather instruction (if available) would cost you 4 memory (or L3, or L2, or L1) memory fetches. While you would be performing 4 compares in your key test (yes this part is faster), a memory fetch from RAM could be well over 200x that of the compare, from L3 this would be 20x-40x. The binary search would cost you 2 or 3 memory fetches. IOW the non-vector verses vector gather uses half to three fourths  the memory/cache bandwidth of that of a vector gather.

The reason the skip table works, is should the skip table have used REAL(8), the same four entry fetch is performed in one fetch from RAM. But since I compressed the lookup values to REAL(4), which is suitable for the skip table two four entry fetches is performed in one fetch from RAM. *** Note, you will/may have to test the code to handle the case where the compressed table has duplicate entries, and/or where the lesser precision number does not undershoot the range of the larger precision number. The precision issue can be handled by adding TINY(0.0) to the value inserted into the skip table (but this will not fix the duplicates issue).

A second reason why the skip table works, is during a period of a high number of searches, the skip table entries will tend to locate in L1 cache.

Keep in mind that the skip table appears to become less effective as it grew to 128 entries. You may have to experiment to find the point of diminishing returns. My guess is it may not be productive to multi-level the skip table for a 1M entry table, 1G entry might be effective.

With all the above said, depending on the range and distribution of numbers, you might be able to reduce the skip table to 2 bytes or 1 byte per entry. Knowing:

```real(kind=8) :: range_max = 2.13001463673842_r8              ! Maximum possible value in the test case
real(kind=8) :: range_min = 1.00558425189593_r8              ! Minimum possible value in the test case
```

An array of INTEGER(2) holding something like:

```real(kind=8) :: range_span ! range_max - range_min
real(kind=8) :: range_convert ! HUGE(SkipTable(1)) / range_span
...
range_span = range_max - range_min
range_convert = real(HUGE(SkipTable(1))) / range_span
...
SkipTable(i) = (lookup_table%input_table(k) - range_min) * range_convert
```

And you would have to modify the SkipKey in the same manner for the skip table search.

Jim Dempsey

Highlighted
Black Belt
6 Views

## Here is an actual

Here is an actual implementation of the Index Table as described above. First the results:

``` Allocate arrays to    100000000
Initialize Binary Search Lookup Table
Initialize Skip Table (Pre-Binary Search helper)
Initialize Index Table (Pre-Binary Search helper)
initialize Input Array (key list for search test)
Binary Search
Average Number Of Probes    26.65782     time    17.1107615227811
Skip Table with Binary Search
Average Number Of Probes    19.87214     time    11.9470917559229
Index Table with Binary Search
Average Number Of Probes    11.99948     time    7.49961695726961
```

Then the code:

```module globals
integer, parameter :: r8 = selected_real_kind(12)
integer, parameter :: iTableSize = 100000000

type lookupTable                                             ! Lookup table containing the input and output values for numbers in a given range
real(kind=8), allocatable :: input_table(:)
real(kind=8), allocatable :: output_table(:)
end type lookupTable
real(kind=8), allocatable :: input_arr(:)                             ! Array containing the test cases
real(kind=8) :: range_max = 2.13001463673842_r8              ! Maximum possible value in the test case
real(kind=8) :: range_min = 1.00558425189593_r8              ! Minimum possible value in the test case
real(kind=8) :: interval, input_interval                     ! Calculate the values of the intervals for the lookup table elements and input array elements
integer :: index_min, index_max, index_mid                   ! Index values for binary search
real(kind=8) :: key                                          ! Temporarily storing the key
type(lookupTable) :: lookup_table
real(kind=8), allocatable :: final_Outputs(:)                       ! Output values based on the test case inputs
real(kind=8), allocatable :: random_mat(:)                          ! Value using intrinsic logarithms, make sure that the computations are actually done
integer(kind=8) :: numberOfProbes, totalNumberOfProbes
integer, parameter :: vecLanes = 128
integer, parameter :: SkipStride = iTableSize / (vecLanes + 1)
real(kind=4) :: SkipTable(vecLanes), SkipKey
!dir\$ attributes align : 64 :: SkipTable
real(kind=8) :: startTime, endTime, elapseTime
integer, parameter :: IndexTableSize = 65536
type IndexTable_t
integer :: index_min, index_max
end type IndexTable_t
type(IndexTable_t) :: IndexTable(IndexTableSize)
real(kind=8) :: IndexTableDelta, IndexTableNextCut
end module globals

subroutine allocate_arrays()
use globals
implicit none
print *,"Allocate arrays to ", iTableSize
allocate(lookup_table%input_table(iTableSize), lookup_table%output_table(iTableSize), input_arr(iTableSize))
allocate(final_Outputs(iTableSize), random_mat(iTableSize))
end subroutine allocate_arrays

subroutine init_lookup_table()
use globals
implicit none
integer :: i
print *, "Initialize Binary Search Lookup Table"
interval = (range_max - range_min)/10000.0_r8
input_interval = (range_max - range_min)/1000000.0_r8

!            Lookup Table Creation             !

lookup_table%input_table(1) = range_min
lookup_table%output_table(1) = log(range_min)

do i = 2,iTableSize-1

lookup_table%input_table(i) = lookup_table%input_table(i-1) + interval
lookup_table%output_table(i) = log(lookup_table%input_table(i))

end do

lookup_table%input_table(iTableSize) = range_max
lookup_table%output_table(iTableSize) = log(range_max)
end subroutine init_lookup_table

subroutine init_skip_table()
use globals
implicit none
integer :: i, k
print *,"Initialize Skip Table (Pre-Binary Search helper)"
do i = 1,vecLanes
k = i*SkipStride
SkipTable(i) = lookup_table%input_table(k)
end do
end subroutine init_skip_table

subroutine init_IndexTable()
use globals
implicit none
integer :: i, j
real(kind=8) :: the_max, the_min
print *,"Initialize Index Table (Pre-Binary Search helper)"
the_min = minval(lookup_table%input_table)
the_max = maxval(lookup_table%input_table)
!   IndexTableDelta = (range_max - range_min) / (IndexTableSize + 1)
IndexTableDelta = (the_max - the_min) / (IndexTableSize + 1)
!   IndexTableNextCut = range_min
IndexTableNextCut = the_min
index_min = 1
index_max = 1
do i=1,IndexTableSize
IndexTableNextCut = IndexTableNextCut + IndexTableDelta
if(index_max <= iTableSize) then
do
if(lookup_table%input_table(index_max) > IndexTableNextCut) then
indexTable(i)%index_min = index_min
IndexTable(i)%index_max = index_max
index_min = index_max
index_max = index_max+1
exit
endif
index_max = index_max+1
if(index_max > iTableSize) then
indexTable(i)%index_min = index_min
IndexTable(i)%index_max = iTableSize
exit
endif
end do
else
IndexTable(i)%index_min = iTableSize
IndexTable(i)%index_max = iTableSize
endif
end do
end subroutine init_IndexTable

subroutine init_input_array()
use globals
implicit none
integer :: i
print *,"initialize Input Array (key list for search test)"
input_arr(1) = range_min
do i = 2,iTableSize-1
input_arr(i) = input_arr(i-1) + input_interval
end do
input_arr(iTableSize) = range_max
end subroutine init_input_array

subroutine TestBinarySearch()
use globals
use omp_lib
!        Do binary search over input values and then a linear interpolation           !
print *,"Binary Search"
startTime = omp_get_wtime()
totalNumberOfProbes = 0
do i = 1,iTableSize
index_max = iTableSize
index_min = 1
key = input_arr(i)
numberOfProbes = 0
do while(index_min <= index_max)
numberOfProbes = numberOfProbes + 1
index_mid = (index_max + index_min)/2
if(lookup_table%input_table(index_mid) == key) then
final_Outputs(i) = lookup_table%output_table(index_mid)
exit
else if(lookup_table%input_table(index_mid) > key) then
index_max = index_mid - 1
else
index_min = index_mid + 1
end if
end do
totalNumberOfProbes = totalNumberOfProbes + numberOfProbes
final_Outputs(i) = lookup_table%output_table(index_mid)
end do
endTime = omp_get_wtime()
elapseTime = endTime - startTime
print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime
random_mat = final_Outputs
end subroutine TestBinarySearch

subroutine SkipTableTestBinarySearch()
use globals
use omp_lib
implicit none
integer :: i,j
print *,"Skip Table with Binary Search"
startTime = omp_get_wtime()
totalNumberOfProbes = 0
do i = 1,iTableSize
numberOfProbes = 0
key = input_arr(i)
! use SkipTable to reduce the number of binary search memory probes
SkipKey = key
index_min = 1
do j=1,vecLanes
if(SkipTable(j) > SkipKey) exit
index_min = index_min + SkipStride
numberOfProbes = numberOfProbes + 1
end do
index_max = min(index_min + SkipStride, iTableSize)
! now binary search reduced range
do while(index_min <= index_max)
numberOfProbes = numberOfProbes + 1
index_mid = (index_max + index_min)/2
if(lookup_table%input_table(index_mid) == key) then
final_Outputs(i) = lookup_table%output_table(index_mid)
exit
else if(lookup_table%input_table(index_mid) > key) then
index_max = index_mid - 1
else
index_min = index_mid + 1
end if
end do
totalNumberOfProbes = totalNumberOfProbes + numberOfProbes
final_Outputs(i) = lookup_table%output_table(index_mid)
end do
endTime = omp_get_wtime()
elapseTime = endTime - startTime
print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime
random_mat = final_Outputs
end subroutine SkipTableTestBinarySearch

subroutine IndexTableBinarySearch()
use globals
use omp_lib
implicit none
integer :: i,j,k
print *,"Index Table with Binary Search"
startTime = omp_get_wtime()
totalNumberOfProbes = 0
do i = 1,iTableSize
numberOfProbes = 0
key = input_arr(i)
! use Index Table to reduce range of Binary Search
j = min(max(1,int((key - range_min) / IndexTableDelta)),IndexTableSize)
numberOfProbes = numberOfProbes + 1
index_min = IndexTable(j)%index_min
index_max = IndexTable(j)%index_max
j = 0
! now perform binary search on remaining range
do while(index_min <= index_max)
numberOfProbes = numberOfProbes + 1
index_mid = (index_max + index_min)/2
if(lookup_table%input_table(index_mid) == key) then
final_Outputs(i) = lookup_table%output_table(index_mid)
exit
else if(lookup_table%input_table(index_mid) > key) then
index_max = index_mid - 1
else
index_min = index_mid + 1
end if
end do
totalNumberOfProbes = totalNumberOfProbes + numberOfProbes
final_Outputs(i) = lookup_table%output_table(index_mid)
end do
endTime = omp_get_wtime()
elapseTime = endTime - startTime
print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime
random_mat = final_Outputs
end subroutine IndexTableBinarySearch

program fifthDraftBinary
use omp_lib ! for omp_get_wtime()
use globals
implicit none
integer :: i,j,k
! Allocate working arrays
call allocate_arrays()

! Initialize tables used by different algorithms
! Assumed to occur infrequently (when database changes)

! Main database and binary search table
call init_lookup_table()

! Modified binary search with skip table
call init_skip_table()

! Modified binary search with index table
call init_IndexTable()

! initialize the keys to search for in the various tests
call init_input_array()

call TestBinarySearch()

! SkipTable Binary Search
call SkipTableTestBinarySearch()

! IndexTable Binary Search
call IndexTableBinarySearch()

end program fifthDraftBinary
```

You can experiment with adjusting the table sizes. Doubling the Index Table size took one probe off.

Jim Dempsey

Highlighted
Beginner
6 Views

## Hi Jim,

Hi Jim,

Thanks a lot for all the helpful answers and code. I have integrated your skip table based code. (I've to experiment with various sizes of skipStride). I get the point that vectorization is not possible with binary search.

I will work with the above code also since the improvement seems to be much more than traditional binary search, and even skip table based search.

Thanks,
Amlesh

Highlighted
Black Belt
6 Views

## Amlesh, et al

Amlesh, et al

The Skip Table should be insensitive to the distribution of the keys within the database. Meaning it is not effected by clustering of key values. Whereas the Index table works best with relatively even distribution of keys. When you construct the index table, it may be advantageous to size the table such that each entry presents a range that tends to reside within a page, and thus reducing the number of TLB loads while performing the binary search. This would require some experimentation to confirm this assumption.

Side note:

The index table could possibly be condensed into one entry, but this would not provide for the means to distinguish between a node using the same index for index_min and index_max, or using {index-1,index} where the index_min is assumed to be at least 1 less than index_max (with exception for index==1).

Jim Dempsey