Solved: Understanding OpenMP optimizations

gmadison · ‎12-23-2020

Just curious on OpenMP optimizations for fortran

Consider this block

do i=1,count

... do a bunch of stuff all related to sorting for a tree with no aliasing

end do

Will the compiler break this into thread groups of size count / num_cores in blocks? I don't want it dispatching each item for all count but I would want it to divide it into larger blocks for each thread to work on.

If I were to do this directly in pthreads I would calculate the block size of count / num_threads then dispatch these blocks to each thread with a start / end count.

As always thanks ahead of time.

jimdempseyatthecove · ‎12-23-2020

The default OpenMP behavior is static scheduling, with each OpenMP team member thread receiving as best as possible an equal distribution of indices.

When the iteration count is > thread count, the threads receive (in C-speak)

count / nThreads + (count % nThreads > iThread ? 1 : 0)

or Fortran speak

myIterations = count / nThreads
if(mod(count, nThreads) > iThread) myIteratons = myIterations + 1

With each thread's starting index being the sum of the preceding threads slice count.

when count == nThreads each thread gets 1 index

when count < nThreads only the first count # threads get 1 index

Check the SCHEDULE clause in:

DO Directive (intel.com)

Jim Dempsey

View solution in original post

jimdempseyatthecove · ‎12-23-2020