Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Understanding OpenMP optimizations

gmadison
Beginner
450 Views

Just curious on OpenMP optimizations for fortran

 

Consider this block

do i=1,count

... do a bunch of stuff all related to sorting for a tree with no aliasing

end do

Will the compiler break this into thread groups of size count / num_cores in blocks? I don't want it dispatching each item for all count but I would want it to divide it into larger blocks for each thread to work on.

If I were to do this directly in pthreads I would calculate the block size of count / num_threads then dispatch these blocks to each thread with a start / end count.

As always thanks ahead of time.

 

 

0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
444 Views

The default OpenMP behavior is static scheduling, with each OpenMP team member thread receiving as best as possible an equal distribution of indices.

When the iteration count is > thread count, the threads receive (in C-speak)

count / nThreads + (count % nThreads > iThread ? 1 : 0)

or Fortran speak

myIterations = count / nThreads
if(mod(count, nThreads) > iThread) myIteratons = myIterations + 1

With each thread's starting index being the sum of the preceding threads slice count.

when count == nThreads each thread gets 1 index

when count < nThreads only the first count # threads get 1 index

Check the SCHEDULE clause in:

DO Directive (intel.com)

Jim Dempsey

View solution in original post

0 Kudos
2 Replies
jimdempseyatthecove
Honored Contributor III
445 Views

The default OpenMP behavior is static scheduling, with each OpenMP team member thread receiving as best as possible an equal distribution of indices.

When the iteration count is > thread count, the threads receive (in C-speak)

count / nThreads + (count % nThreads > iThread ? 1 : 0)

or Fortran speak

myIterations = count / nThreads
if(mod(count, nThreads) > iThread) myIteratons = myIterations + 1

With each thread's starting index being the sum of the preceding threads slice count.

when count == nThreads each thread gets 1 index

when count < nThreads only the first count # threads get 1 index

Check the SCHEDULE clause in:

DO Directive (intel.com)

Jim Dempsey

0 Kudos
Ron_Green
Moderator
436 Views

And to add to this, default behavior is not #threads = #cores, it is #threads = #"processors".

Processors are 2x the core count if Hyperthreading is enabled.  4x on MIC processors as they are 4 hyperthreads per core. 

Set  env var KMP_AFFINITY=verbose

to get a list of where threads are bound.

0 Kudos
Reply