- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey,
I am trying to implement a lookup table based on binary search of the incoming element, and if the element doesn't exist, then pick the nearest element.
The problem I am facing is that the loop in which these lookup table computations are performed doesn't get vectorized because of the binary search implementation (binary search contains do while loop where loop count is not fixed, and hence that while loop is not vectorized. Even if I convert it to a do loop, there are exit statements which disable vectorization.)
Is it even possible to vectorize these loops? Maybe some other better search technique which can enable vectorization?
I am attaching the code as well as the opt-report. Compiler used is ifort.
Thanks.
Amlesh
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It's almost trivial to change do while to a counted DO loop using the original values of index_min and index_max. For a couple of years, ifort has had the ability to vectorize a linear search with a single simple exit from a simple counted DO (one step beyond the rule the vectorizer quotes about not accepting multiple exits). It may be possible to perform binary search to narrow the limits to where the vectorized linear search is faster than continued binary search.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vectorizing a binary search, assuming keys are sorted, is counterproductive, due to the fact that "vectorization" in this sense would require the use of a vector gather instruction. IOW read a vector worth of elements distributed within the array of keys. This results in vector width number of reads (2, 4, 8, 16), followed by 1 compare against vector, then locating the split point within the vector. Whereas the scalar binary search performs 1 scalar read followed by 1 compare against scalar, then locating the split point above or below the split point. In this case, the scalar code will be approximately vector width x faster, possibly more (because you will not need the code to find the split point within the vector).
Vectorization can be done in a convoluted way, at the expense of memory, and in situations where searches overwhelm the number of insertions.
What you do is take your sorted key table and reorganize it as a vector. Assume for example you have a vector width of 8 elements.
The first 1/8th of the key table is written into locations 0, 8, 16, ...
The second 1/8th into 1, 9, 17, ...
The 8th 1/8th into 7, 15, 23, ...
Next produce 8 sub tables, rearranged the same way, and produced from each of the above 8 tables.
And repeat this for each sub table.
The total extra memory (assuming original linear table preserved), and for 8 lanes in vector, is approximately (bit-width of key) / 3. One million keys is ~2**20 and the resultant vector tables would approximate 7x that of the original table...
But the advantage is the key can be found in 7-8 reads as opposed to 20 reads.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
I wrote a simple linear search with a single comparison statement followed by an exit statement. The compiler is still not able to vectorize it.
A binary search with fixed iterations to narrow down the limits for vectorized linear search is feasible only when their is a vectorized linear search. I am posting the code snippet for the above mentioned linear search.
do i = 1,1000000 do j = 1,100 if((input_arr(i) == lookup_table%input_table(j)) .or. (input_arr(i) > lookup_table%input_table(j) .and. input_arr(i) < lookup_table%input_table(j+1))) exit end do final_Outputs(i) = lookup_table%output_table(j) end do
Please suggest any changes that may help vectorize it.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It was quite an effort to paste and edit stuff to where I could get the compiler to run, and I get:
LOOP BEGIN at C:\users\tim\tim\tim\src\net\ak.f90(24,21)
remark #15536: loop was not vectorized: inner loop throttling prevents vector
ization of this outer loop. Refer to inner loop message for more details.
LOOP BEGIN at C:\users\tim\tim\tim\src\net\ak.f90(16,60)
remark #15523: loop was not vectorized: loop control variable J was found,
but loop iteration count cannot be computed before executing the loop
LOOP END
LOOP END
I suppose the reference to multiple array elements depending on j and j+1 may be a problem for optimization. Evidently, this is not simple enough to pass under the compiler's limbo stick. The message about " iteration count cannot be computed" seems to be a bug in the compiler's diagnosis.
It's able to vectorize when I cut it down to just the first part of your compound conditional. By trying the part with j and j+1 references I get the message
remark #15524: loop was not vectorized: search loop cannot be vectorized unless all memory references can be aligned vector load.
which seems to verify that the compiler wants to shift the loop so that the initial array references are aligned, but can't do so on account of referring to 2 consecutive elements.
If you are trying to allow for the table being sorted in either ascending or descending order, I guess you will need to test which case it is and choose a simpler loop accordingly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
I don't properly understand what you're suggesting.
Actually I forgot to mention that in the application I am working on, the keys won't be sorted, the code I have posted is a test code before integrating it in our actual application. The keys would come in any random order. So, rearranging the keys may not be useful.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
Apologies for not pasting the full code last time.
Yeah, just for the first part of the compound conditional statement, loop is getting vectorized. But the thing is, in case the key value is not present in the table, I need the values which are on the left and right of the key value which will allow me to perform some interpolation.
Even if I try just the following - (here I'll atleast know the value to the right of the number, and can easily calculate the left value later)
if( input_arr(i) == lookup_table%input_table(j) .or. (input_arr(i) < lookup_table%input_table(j))) exit
I get the same following message as in the very first case (even if no j+1).
{
LOOP BEGIN at linearSearchImplementation.f90(83,3)
remark #15536: loop was not vectorized: inner loop throttling prevents vectorization of this outer loop. Refer to inner loop message for more details.
LOOP BEGIN at linearSearchImplementation.f90(71,53)
remark #15523: loop was not vectorized: cannot compute loop iteration count before executing the loop.
LOOP END
LOOP END
}
What do you suggest?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't see how you could do a binary search if your table isn't sorted. Still, you could vectorize the equivalent of the loop by searching the conditions separately, or possibly by making an array lookup_table%input - input_array(i) and searching for the desired value there.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
The lookup_table%input_table is a sorted table.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You could search for the j value inside the loop and check the j+1 condition after exit. I've wanted to use an if..then..exit..end if block but the compiler doesn't count it as a single exit.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
An alternative is to use a skip table. Taking your program, adding a skip table and varying the skip table size yields:
Skip table size 8 Average Number Of Probes 19.95140 time 0.125806761439890 (original) Average Number Of Probes 17.00000 time 0.101816151291132 (skip) Skip table size 16 Average Number Of Probes 19.95140 time 0.125341766513884 (original) Average Number Of Probes 16.00000 time 9.339010948315263E-002 (skip) Skip table size 32 Average Number Of Probes 19.95140 time 0.129878542851657 (original) Average Number Of Probes 15.00000 time 8.756401576101780E-002 (skip) Skip table size 64 Average Number Of Probes 19.95140 time 0.124911293853074 (original) Average Number Of Probes 14.00000 time 8.084575645625591E-002 (skip) Skip table size 128 Average Number Of Probes 19.95140 time 0.123953686561435 (original) Average Number Of Probes 13.22490 time 7.466363674029708E-002 (skip)
The technique used was to take skip table size number of probes into key table, convert from real(8) to real(4) and place into the skip table. The skip table was linearly searched, though at a certain size, it would be beneficial to skip table the skip table.
program fifthDraftBinary use omp_lib ! for omp_get_wtime() integer, parameter :: r8 = selected_real_kind(12) integer, parameter :: iTableSize = 1000000 type lookupTable ! Lookup table containing the input and output values for numbers in a given range real(kind=8) :: input_table(iTableSize) real(kind=8) :: output_table(iTableSize) end type lookupTable real(kind=8) :: input_arr(iTableSize) ! Array containing the test cases integer :: i,j,k real(kind=8) :: range_max = 2.13001463673842_r8 ! Maximum possible value in the test case real(kind=8) :: range_min = 1.00558425189593_r8 ! Minimum possible value in the test case real(kind=8) :: interval, input_interval ! Calculate the values of the intervals for the lookup table elements and input array elements integer :: index_min, index_max, index_mid ! Index values for binary search real(kind=8) :: key ! Temporarily storing the key type(lookupTable) :: lookup_table real(kind=8) :: final_Outputs(iTableSize) ! Output values based on the test case inputs real(kind=8) :: random_mat(iTableSize) ! Value using intrinsic logarithms, make sure that the computations are actually done integer :: numberOfProbes, totalNumberOfProbes integer, parameter :: vecLanes = 128 integer, parameter :: SkipStride = iTableSize / (vecLanes + 1) real(kind=4) :: SkipTable(vecLanes), SkipKey !dir$ attributes align : 64 :: SkipTable real(kind=8) :: startTime, endTime, elapseTime interval = (range_max - range_min)/10000.0_r8 input_interval = (range_max - range_min)/1000000.0_r8 ! Lookup Table Creation ! lookup_table%input_table(1) = range_min lookup_table%output_table(1) = log(range_min) do i = 2,iTableSize-1 lookup_table%input_table(i) = lookup_table%input_table(i-1) + interval lookup_table%output_table(i) = log(lookup_table%input_table(i)) end do lookup_table%input_table(iTableSize) = range_max lookup_table%output_table(iTableSize) = log(range_max) ! Lookup Table Creation End ! ! build SkipTable do i = 1,vecLanes k = (i-1)*SkipStride+1 SkipTable(i) = lookup_table%input_table(k) end do ! Input Array Creation ! input_arr(1) = range_min do i = 2,iTableSize-1 input_arr(i) = input_arr(i-1) + input_interval end do input_arr(iTableSize) = range_max ! Input Array Creation Finished ! ! Do binary search over input values and then a linear interpolation ! startTime = omp_get_wtime() totalNumberOfProbes = 0 do i = 1,iTableSize index_max = iTableSize index_min = 1 key = input_arr(i) numberOfProbes = 0 do while(index_min <= index_max) numberOfProbes = numberOfProbes + 1 index_mid = (index_max + index_min)/2 if(lookup_table%input_table(index_mid) == key) then final_Outputs(i) = lookup_table%output_table(index_mid) exit else if(lookup_table%input_table(index_mid) > key) then index_max = index_mid - 1 else index_min = index_mid + 1 end if end do totalNumberOfProbes = totalNumberOfProbes + numberOfProbes final_Outputs(i) = lookup_table%output_table(index_mid) end do endTime = omp_get_wtime() elapseTime = endTime - startTime print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime ! Approximations end ! random_mat = final_Outputs startTime = omp_get_wtime() totalNumberOfProbes = 0 do i = 1,iTableSize key = input_arr(i) SkipKey = key numberOfProbes = 0 index_min = 1 do j=1,vecLanes if(SkipTable(j) > SkipKey) exit index_min = index_min + SkipStride numberOfProbes = numberOfProbes + 1 end do index_max = min(index_min + SkipStride, iTableSize) do while(index_min <= index_max) numberOfProbes = numberOfProbes + 1 index_mid = (index_max + index_min)/2 if(lookup_table%input_table(index_mid) == key) then final_Outputs(i) = lookup_table%output_table(index_mid) exit else if(lookup_table%input_table(index_mid) > key) then index_max = index_mid - 1 else index_min = index_mid + 1 end if end do totalNumberOfProbes = totalNumberOfProbes + numberOfProbes final_Outputs(i) = lookup_table%output_table(index_mid) end do endTime = omp_get_wtime() elapseTime = endTime - startTime print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime end program fifthDraftBinary
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oops,
saw omission.
Change line 49 to
k = i * SkipStride
The 128 SkipTable size yields:
Average Number Of Probes 19.95140 time 0.124189035966992
Average Number Of Probes 13.16830 time 7.945077167823911E-002
Note, with the correction and this run, it is slightly slower.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
Thanks.
Yeah the performance is definitely improving after adding this skip table, but still the original problem still persists, ie, the outer loop containing these searches (ie on lines 71 and 104) are yet not vectorized. I'll try integrating the code with this skip table in the application I am working on and check the performance.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The outer loop, in the above code, cannot be vectorized, as this would require the inner loops (now 2 loops) to be vectorized in consistency with the vectorization of the outer loop (difficult under the best of circumstances). This may be the case that you do not fully understand the requirements to have candidate code for vectorization.
While mathematically a vector can be construed as a list of things, from the CPU perspective a vector is a "small vector" that can be placed into a vector register, and to carry further, is a list of such small vectors (preferably where a small vector does not cross a cache line). Assuming the preconditions are met, the CPU processes small vectors as: Single Instruction Multiple Data. Meaning the same operation is to be performed on all elements of the small vector. Later SIMD instruction sets permit the instruction to use a mask, thus permitting the operation to be performed on selected elements of the small vector. The CPU is incapable of processing small vectors as MIMD (Multiple Instruction Multiple Data) without the use of threading.
In a non-productive way, such as to satisfy a challenge as to "this cannot be vectorized", and not for the purpose of getting the best performing code, the later vector instructions include scatter/gather vector instructions. With this vector instruction, you could construct a set of indicies at intervals of SkipStride, then issue the vector instruction gather, then test all cells in the vector against the key. On AVX (256-bit), the gather instruction (if available) would cost you 4 memory (or L3, or L2, or L1) memory fetches. While you would be performing 4 compares in your key test (yes this part is faster), a memory fetch from RAM could be well over 200x that of the compare, from L3 this would be 20x-40x. The binary search would cost you 2 or 3 memory fetches. IOW the non-vector verses vector gather uses half to three fourths the memory/cache bandwidth of that of a vector gather.
The reason the skip table works, is should the skip table have used REAL(8), the same four entry fetch is performed in one fetch from RAM. But since I compressed the lookup values to REAL(4), which is suitable for the skip table two four entry fetches is performed in one fetch from RAM. *** Note, you will/may have to test the code to handle the case where the compressed table has duplicate entries, and/or where the lesser precision number does not undershoot the range of the larger precision number. The precision issue can be handled by adding TINY(0.0) to the value inserted into the skip table (but this will not fix the duplicates issue).
A second reason why the skip table works, is during a period of a high number of searches, the skip table entries will tend to locate in L1 cache.
Keep in mind that the skip table appears to become less effective as it grew to 128 entries. You may have to experiment to find the point of diminishing returns. My guess is it may not be productive to multi-level the skip table for a 1M entry table, 1G entry might be effective.
With all the above said, depending on the range and distribution of numbers, you might be able to reduce the skip table to 2 bytes or 1 byte per entry. Knowing:
real(kind=8) :: range_max = 2.13001463673842_r8 ! Maximum possible value in the test case real(kind=8) :: range_min = 1.00558425189593_r8 ! Minimum possible value in the test case
An array of INTEGER(2) holding something like:
real(kind=8) :: range_span ! range_max - range_min real(kind=8) :: range_convert ! HUGE(SkipTable(1)) / range_span ... range_span = range_max - range_min range_convert = real(HUGE(SkipTable(1))) / range_span ... SkipTable(i) = (lookup_table%input_table(k) - range_min) * range_convert
And you would have to modify the SkipKey in the same manner for the skip table search.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here is an actual implementation of the Index Table as described above. First the results:
Allocate arrays to 100000000 Initialize Binary Search Lookup Table Initialize Skip Table (Pre-Binary Search helper) Initialize Index Table (Pre-Binary Search helper) initialize Input Array (key list for search test) Binary Search Average Number Of Probes 26.65782 time 17.1107615227811 Skip Table with Binary Search Average Number Of Probes 19.87214 time 11.9470917559229 Index Table with Binary Search Average Number Of Probes 11.99948 time 7.49961695726961
Then the code:
module globals integer, parameter :: r8 = selected_real_kind(12) integer, parameter :: iTableSize = 100000000 type lookupTable ! Lookup table containing the input and output values for numbers in a given range real(kind=8), allocatable :: input_table(:) real(kind=8), allocatable :: output_table(:) end type lookupTable real(kind=8), allocatable :: input_arr(:) ! Array containing the test cases real(kind=8) :: range_max = 2.13001463673842_r8 ! Maximum possible value in the test case real(kind=8) :: range_min = 1.00558425189593_r8 ! Minimum possible value in the test case real(kind=8) :: interval, input_interval ! Calculate the values of the intervals for the lookup table elements and input array elements integer :: index_min, index_max, index_mid ! Index values for binary search real(kind=8) :: key ! Temporarily storing the key type(lookupTable) :: lookup_table real(kind=8), allocatable :: final_Outputs(:) ! Output values based on the test case inputs real(kind=8), allocatable :: random_mat(:) ! Value using intrinsic logarithms, make sure that the computations are actually done integer(kind=8) :: numberOfProbes, totalNumberOfProbes integer, parameter :: vecLanes = 128 integer, parameter :: SkipStride = iTableSize / (vecLanes + 1) real(kind=4) :: SkipTable(vecLanes), SkipKey !dir$ attributes align : 64 :: SkipTable real(kind=8) :: startTime, endTime, elapseTime integer, parameter :: IndexTableSize = 65536 type IndexTable_t integer :: index_min, index_max end type IndexTable_t type(IndexTable_t) :: IndexTable(IndexTableSize) real(kind=8) :: IndexTableDelta, IndexTableNextCut end module globals subroutine allocate_arrays() use globals implicit none print *,"Allocate arrays to ", iTableSize allocate(lookup_table%input_table(iTableSize), lookup_table%output_table(iTableSize), input_arr(iTableSize)) allocate(final_Outputs(iTableSize), random_mat(iTableSize)) end subroutine allocate_arrays subroutine init_lookup_table() use globals implicit none integer :: i print *, "Initialize Binary Search Lookup Table" interval = (range_max - range_min)/10000.0_r8 input_interval = (range_max - range_min)/1000000.0_r8 ! Lookup Table Creation ! lookup_table%input_table(1) = range_min lookup_table%output_table(1) = log(range_min) do i = 2,iTableSize-1 lookup_table%input_table(i) = lookup_table%input_table(i-1) + interval lookup_table%output_table(i) = log(lookup_table%input_table(i)) end do lookup_table%input_table(iTableSize) = range_max lookup_table%output_table(iTableSize) = log(range_max) end subroutine init_lookup_table subroutine init_skip_table() use globals implicit none integer :: i, k print *,"Initialize Skip Table (Pre-Binary Search helper)" do i = 1,vecLanes k = i*SkipStride SkipTable(i) = lookup_table%input_table(k) end do end subroutine init_skip_table subroutine init_IndexTable() use globals implicit none integer :: i, j real(kind=8) :: the_max, the_min print *,"Initialize Index Table (Pre-Binary Search helper)" the_min = minval(lookup_table%input_table) the_max = maxval(lookup_table%input_table) ! IndexTableDelta = (range_max - range_min) / (IndexTableSize + 1) IndexTableDelta = (the_max - the_min) / (IndexTableSize + 1) ! IndexTableNextCut = range_min IndexTableNextCut = the_min index_min = 1 index_max = 1 do i=1,IndexTableSize IndexTableNextCut = IndexTableNextCut + IndexTableDelta if(index_max <= iTableSize) then do if(lookup_table%input_table(index_max) > IndexTableNextCut) then indexTable(i)%index_min = index_min IndexTable(i)%index_max = index_max index_min = index_max index_max = index_max+1 exit endif index_max = index_max+1 if(index_max > iTableSize) then indexTable(i)%index_min = index_min IndexTable(i)%index_max = iTableSize exit endif end do else IndexTable(i)%index_min = iTableSize IndexTable(i)%index_max = iTableSize endif end do end subroutine init_IndexTable subroutine init_input_array() use globals implicit none integer :: i print *,"initialize Input Array (key list for search test)" input_arr(1) = range_min do i = 2,iTableSize-1 input_arr(i) = input_arr(i-1) + input_interval end do input_arr(iTableSize) = range_max end subroutine init_input_array subroutine TestBinarySearch() use globals use omp_lib ! Do binary search over input values and then a linear interpolation ! print *,"Binary Search" startTime = omp_get_wtime() totalNumberOfProbes = 0 do i = 1,iTableSize index_max = iTableSize index_min = 1 key = input_arr(i) numberOfProbes = 0 do while(index_min <= index_max) numberOfProbes = numberOfProbes + 1 index_mid = (index_max + index_min)/2 if(lookup_table%input_table(index_mid) == key) then final_Outputs(i) = lookup_table%output_table(index_mid) exit else if(lookup_table%input_table(index_mid) > key) then index_max = index_mid - 1 else index_min = index_mid + 1 end if end do totalNumberOfProbes = totalNumberOfProbes + numberOfProbes final_Outputs(i) = lookup_table%output_table(index_mid) end do endTime = omp_get_wtime() elapseTime = endTime - startTime print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime random_mat = final_Outputs end subroutine TestBinarySearch subroutine SkipTableTestBinarySearch() use globals use omp_lib implicit none integer :: i,j print *,"Skip Table with Binary Search" startTime = omp_get_wtime() totalNumberOfProbes = 0 do i = 1,iTableSize numberOfProbes = 0 key = input_arr(i) ! use SkipTable to reduce the number of binary search memory probes SkipKey = key index_min = 1 do j=1,vecLanes if(SkipTable(j) > SkipKey) exit index_min = index_min + SkipStride numberOfProbes = numberOfProbes + 1 end do index_max = min(index_min + SkipStride, iTableSize) ! now binary search reduced range do while(index_min <= index_max) numberOfProbes = numberOfProbes + 1 index_mid = (index_max + index_min)/2 if(lookup_table%input_table(index_mid) == key) then final_Outputs(i) = lookup_table%output_table(index_mid) exit else if(lookup_table%input_table(index_mid) > key) then index_max = index_mid - 1 else index_min = index_mid + 1 end if end do totalNumberOfProbes = totalNumberOfProbes + numberOfProbes final_Outputs(i) = lookup_table%output_table(index_mid) end do endTime = omp_get_wtime() elapseTime = endTime - startTime print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime random_mat = final_Outputs end subroutine SkipTableTestBinarySearch subroutine IndexTableBinarySearch() use globals use omp_lib implicit none integer :: i,j,k print *,"Index Table with Binary Search" startTime = omp_get_wtime() totalNumberOfProbes = 0 do i = 1,iTableSize numberOfProbes = 0 key = input_arr(i) ! use Index Table to reduce range of Binary Search j = min(max(1,int((key - range_min) / IndexTableDelta)),IndexTableSize) numberOfProbes = numberOfProbes + 1 index_min = IndexTable(j)%index_min index_max = IndexTable(j)%index_max j = 0 ! now perform binary search on remaining range do while(index_min <= index_max) numberOfProbes = numberOfProbes + 1 index_mid = (index_max + index_min)/2 if(lookup_table%input_table(index_mid) == key) then final_Outputs(i) = lookup_table%output_table(index_mid) exit else if(lookup_table%input_table(index_mid) > key) then index_max = index_mid - 1 else index_min = index_mid + 1 end if end do totalNumberOfProbes = totalNumberOfProbes + numberOfProbes final_Outputs(i) = lookup_table%output_table(index_mid) end do endTime = omp_get_wtime() elapseTime = endTime - startTime print *,'Average Number Of Probes ', real(totalNumberOfProbes) / real(iTableSize),"time ",elapseTime random_mat = final_Outputs end subroutine IndexTableBinarySearch program fifthDraftBinary use omp_lib ! for omp_get_wtime() use globals implicit none integer :: i,j,k ! Allocate working arrays call allocate_arrays() ! Initialize tables used by different algorithms ! Assumed to occur infrequently (when database changes) ! Main database and binary search table call init_lookup_table() ! Modified binary search with skip table call init_skip_table() ! Modified binary search with index table call init_IndexTable() ! initialize the keys to search for in the various tests call init_input_array() ! Traditional Binary Search call TestBinarySearch() ! SkipTable Binary Search call SkipTableTestBinarySearch() ! IndexTable Binary Search call IndexTableBinarySearch() end program fifthDraftBinary
You can experiment with adjusting the table sizes. Doubling the Index Table size took one probe off.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Jim,
Thanks a lot for all the helpful answers and code. I have integrated your skip table based code. (I've to experiment with various sizes of skipStride). I get the point that vectorization is not possible with binary search.
I will work with the above code also since the improvement seems to be much more than traditional binary search, and even skip table based search.
Thanks,
Amlesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Amlesh, et al
The Skip Table should be insensitive to the distribution of the keys within the database. Meaning it is not effected by clustering of key values. Whereas the Index table works best with relatively even distribution of keys. When you construct the index table, it may be advantageous to size the table such that each entry presents a range that tends to reside within a page, and thus reducing the number of TLB loads while performing the binary search. This would require some experimentation to confirm this assumption.
Side note:
The index table could possibly be condensed into one entry, but this would not provide for the means to distinguish between a node using the same index for index_min and index_max, or using {index-1,index} where the index_min is assumed to be at least 1 less than index_max (with exception for index==1).
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page