- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear All, Help I have a subroutine search value for array below PART1
Can I use openmp derivative increase search speed? like PART2
PART1
subroutine p_search(array, length, value, location) implicit none integer length integer,dimension(length) :: array logical,dimension(length) :: ref integer value integer location ref = .false. where( array .eq. value) ref = .true. if( count( ref ) .ne. 1 )then write(*,*)"p_search: error count(ref)=",count(ref) write(*,*)"array=",array write(*,*)"value=",value stop else location = maxloc(array,1,ref) endif return end subroutine p_search
PART2
subroutine p_search(array, length, value, location) implicit none integer length integer,dimension(length) :: array logical,dimension(length) :: ref integer value integer location ref = .false. !$OMP PARALLEL where( array .eq. value) ref = .true. !$OMP PARALLEL if( count( ref ) .ne. 1 )then write(*,*)"p_search: error count(ref)=",count(ref) write(*,*)"array=",array write(*,*)"value=",value stop else location = maxloc(array,1,ref) endif return end subroutine p_search
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It strikes me that you have limited yourself to using an unsorted array, and I question why. If, on the one hand, the array is small and/or the number of times that you search for a value in the array is small, it is fine to search within an unsorted array but, in that case, it would not make sense to attempt parallelization. On the other hand, sorting a large array is CPU intensive, and can be justified only if a large number of searches need to be made after sorting.
To quantify my remarks, let N be the size of the array, and let us suppose that you will make M searches, and let us assume that we use a single thread. In that case, sorting and searching takes c.N.log2N + d.M.log2N operations, where c and d are O(1). Compared to this, searching through an unsorted array takes O(M.N) operations. Your proposal to use parallelize the search would still take O(M.N) operations, but lower the constant multiplier. If we name this multiplier 'b', you would lower the operation count from b.M.N to b.M.N/8 if you use eight threads and the algorithm is 100 percent parallelizable (which it would not be, in reality).
Use values of M and N suitable for your situation, and decide whether sorting before searching would be worthwhile.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you are hoping to see OpenMP parallelization of WHERE, you will need the !$omp workshare directive.
The PARALLEL alone simply causes all the threads to execute the code, as I understand it, which is probably slower than 1 thread.
Although recent Intel compilers have begun to implement multi-threading for OMP WORKSHARE, in my examples the performance is less than 50% of a counted OMP DO loop.
I have doubts that you will come out ahead by creating another array as big as the one you are searching and replaciing an == comparison by a logical one in an eventual reduction.
In my webinar draft, I show an example of parallel-vector linear search. Ideally, you might hope that omp workshare simd might do such a thing. In cases I have worked with, explicit inner simd and outer threaded reduction variables are needed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your example code is a case of where code brevity induces a larger degree of processing than necessary. The use of where processes array array once, count process array ref once, and maxloc processes arrays array and ref once more. This is four array scans. Something like the following may be better (you test)
subroutine p_search(array, length, value, location) use omp_lib implicit none integer,dimension(length) :: array integer length integer value integer location integer, parameter :: cutoff=10000 ! *** you determine what this is integer i location = 0 if((length .lt. cutoff) .or. omp_in_parallel()) then do i=1,length if(array(i) .eq. value) location = location + length+i end do else !$omp parallel do reduction(+:location) num_threads(length / cutoff + 1) do i=1,length if(array(i) .eq. value) location = location + length+i end do !$omp end parallel do endif if((location .eq. 0) .or. (location .ge. length)) then write(*,*)"p_search: error -", location / length, "duplicates found" do i=1,length if(array(i) .eq. value) write(*,*) "array(",i,") =,", value end do stop endif end subroutine p_search
You will have to determine the value for cutoff
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may also want to add to the above:
integer, parameter :: limitThreads = 8 ! *** system dependent, proportional to # memory channels ... !$omp parallel do reduction(+:location) num_threads(min(length / cutoff + 1, limitThreads))
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page