fortran where statement VS openmp?

Hwa-Yang_S_ · ‎10-19-2015

Dear All, Help I have a subroutine search value for array below PART1

Can I use openmp derivative increase search speed? like PART2

PART1

 subroutine p_search(array, length, value, location)
   implicit none
   integer length
   integer,dimension(length) :: array
   logical,dimension(length) :: ref
   integer value
   integer location

   ref = .false.
   where( array .eq. value) ref = .true.
   if( count( ref ) .ne. 1 )then
      write(*,*)"p_search: error count(ref)=",count(ref)
      write(*,*)"array=",array
      write(*,*)"value=",value
      stop
   else
      location = maxloc(array,1,ref)
   endif
   return
 end subroutine p_search

PART2

 subroutine p_search(array, length, value, location)
   implicit none
   integer length
   integer,dimension(length) :: array
   logical,dimension(length) :: ref
   integer value
   integer location

   ref = .false.
!$OMP PARALLEL
   where( array .eq. value) ref = .true.
!$OMP PARALLEL
   if( count( ref ) .ne. 1 )then
      write(*,*)"p_search: error count(ref)=",count(ref)
      write(*,*)"array=",array
      write(*,*)"value=",value
      stop
   else
      location = maxloc(array,1,ref)
   endif
   return
 end subroutine p_search

mecej4 · ‎10-20-2015

It strikes me that you have limited yourself to using an unsorted array, and I question why. If, on the one hand, the array is small and/or the number of times that you search for a value in the array is small, it is fine to search within an unsorted array but, in that case, it would not make sense to attempt parallelization. On the other hand, sorting a large array is CPU intensive, and can be justified only if a large number of searches need to be made after sorting.

To quantify my remarks, let N be the size of the array, and let us suppose that you will make M searches, and let us assume that we use a single thread. In that case, sorting and searching takes c.N.log₂N + d.M.log₂N operations, where c and d are O(1). Compared to this, searching through an unsorted array takes O(M.N) operations. Your proposal to use parallelize the search would still take O(M.N) operations, but lower the constant multiplier. If we name this multiplier 'b', you would lower the operation count from b.M.N to b.M.N/8 if you use eight threads and the algorithm is 100 percent parallelizable (which it would not be, in reality).

Use values of M and N suitable for your situation, and decide whether sorting before searching would be worthwhile.

TimP · ‎10-20-2015

If you are hoping to see OpenMP parallelization of WHERE, you will need the !$omp workshare directive.

The PARALLEL alone simply causes all the threads to execute the code, as I understand it, which is probably slower than 1 thread.

Although recent Intel compilers have begun to implement multi-threading for OMP WORKSHARE, in my examples the performance is less than 50% of a counted OMP DO loop.

I have doubts that you will come out ahead by creating another array as big as the one you are searching and replaciing an == comparison by a logical one in an eventual reduction.

In my webinar draft, I show an example of parallel-vector linear search. Ideally, you might hope that omp workshare simd might do such a thing. In cases I have worked with, explicit inner simd and outer threaded reduction variables are needed.

jimdempseyatthecove · ‎10-21-2015

Your example code is a case of where code brevity induces a larger degree of processing than necessary. The use of where processes array array once, count process array ref once, and maxloc processes arrays array and ref once more. This is four array scans. Something like the following may be better (you test)

subroutine p_search(array, length, value, location)
  use omp_lib
  implicit none
  integer,dimension(length) :: array
  integer length
  integer value
  integer location
  integer, parameter :: cutoff=10000 ! *** you determine what this is
  integer i
  location = 0
  if((length .lt. cutoff) .or. omp_in_parallel()) then
      do i=1,length
          if(array(i) .eq. value) location = location + length+i
      end do
  else
      !$omp parallel do reduction(+:location) num_threads(length / cutoff + 1)
      do i=1,length
          if(array(i) .eq. value) location = location + length+i
      end do
      !$omp end parallel do
  endif
  if((location .eq. 0) .or. (location .ge. length)) then
     write(*,*)"p_search: error -", location / length, "duplicates found"
     do i=1,length
         if(array(i) .eq. value) write(*,*) "array(",i,") =,", value
     end do
     stop
  endif
end subroutine p_search

You will have to determine the value for cutoff

Jim Dempsey

jimdempseyatthecove · ‎10-21-2015

You may also want to add to the above:

integer, parameter :: limitThreads = 8 ! *** system dependent, proportional to # memory channels
...
!$omp parallel do reduction(+:location) num_threads(min(length / cutoff + 1, limitThreads))

Jim Dempsey