- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just curious on OpenMP optimizations for fortran
Consider this block
do i=1,count
... do a bunch of stuff all related to sorting for a tree with no aliasing
end do
Will the compiler break this into thread groups of size count / num_cores in blocks? I don't want it dispatching each item for all count but I would want it to divide it into larger blocks for each thread to work on.
If I were to do this directly in pthreads I would calculate the block size of count / num_threads then dispatch these blocks to each thread with a start / end count.
As always thanks ahead of time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The default OpenMP behavior is static scheduling, with each OpenMP team member thread receiving as best as possible an equal distribution of indices.
When the iteration count is > thread count, the threads receive (in C-speak)
count / nThreads + (count % nThreads > iThread ? 1 : 0)
or Fortran speak
myIterations = count / nThreads
if(mod(count, nThreads) > iThread) myIteratons = myIterations + 1
With each thread's starting index being the sum of the preceding threads slice count.
when count == nThreads each thread gets 1 index
when count < nThreads only the first count # threads get 1 index
Check the SCHEDULE clause in:
Jim Dempsey
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The default OpenMP behavior is static scheduling, with each OpenMP team member thread receiving as best as possible an equal distribution of indices.
When the iteration count is > thread count, the threads receive (in C-speak)
count / nThreads + (count % nThreads > iThread ? 1 : 0)
or Fortran speak
myIterations = count / nThreads
if(mod(count, nThreads) > iThread) myIteratons = myIterations + 1
With each thread's starting index being the sum of the preceding threads slice count.
when count == nThreads each thread gets 1 index
when count < nThreads only the first count # threads get 1 index
Check the SCHEDULE clause in:
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
And to add to this, default behavior is not #threads = #cores, it is #threads = #"processors".
Processors are 2x the core count if Hyperthreading is enabled. 4x on MIC processors as they are 4 hyperthreads per core.
Set env var KMP_AFFINITY=verbose
to get a list of where threads are bound.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page