Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

omp do schedule

antfu
Beginner
990 Views
Dear all,
I have 12 threads, N>12 (for example 48) iterations, there is no dependence across iterations in the iloop. Here is my code:
!$OMP parallel default(shared) private(i)
!$omp do schedule(dynamic)
! here I also tried schedule (static)
iloop:do i=1,N
jloop: do j=1,K
...
enddo jloop
enddo iloop
!$omp end do nowait
! here I also tried without nowait

!$omp end parallel

For the first 12 iterations, they are doing jobs parallely. However, after the first 12 iterations, the job is done serially apparently. Anything wrong with my code?
0 Kudos
5 Replies
TimP
Honored Contributor III
990 Views
schedule(dynamic) would assign each of the first 12 iterations to a separate thread. As each thread finishes an iteration, it will be assigned to the next iteration. This is likely to be somewhat inefficient, due to the lack of locality, as well as the extra run time processing, but all threads should remain active until there are no fresh iterations to be started.
0 Kudos
antfu
Beginner
990 Views
Judging from the speed of the post 12 iterations, I think only one thread is working at a time.
0 Kudos
jimdempseyatthecove
Honored Contributor III
990 Views
Is loop i an iterative loop for making test runs or a distribution (slicing) loop?

If used to increase test runs then the do j loop would be the one you parallize.

!$OMP parallel default(shared) private(i)
!$omp do schedule(dynamic)
iloop:do i=1,N
!$OMP CRITICAL
write(*,*) i, omp_get_thread_num()
!$OMP END CRITICAL
jloop: do j=1,K
...
enddo jloop
enddo iloop
!$omp end do
!$omp end parallel

Which threads to which iteration?

Does choice of i vary the amount of computation?

Jim Dempsey
0 Kudos
antfu
Beginner
990 Views
I am so sorry for the mistake I made. There was one variable that should be threadprivate, but was made to be shared, some threads were made to wait when they got a wrong value of that variable. Now it works.

But as a general question, when should I use dynamic, static, or guided in the schedule. I read some instructions on this issue, but too abstract.

For example, if the choice of i does not the amount of computation, is it true that it does not matter? if computation burdens do changes across iterations, then dynammic is better?

Thanks a lot.
0 Kudos
TimP
Honored Contributor III
990 Views
static is usually best when dividing the loop iterations evenly among threads, in combination with correct affinity settings, when each loop iteration has a similar amount of work.
dynamic often works best with a greater than unity chunk size (set experimentally).
guided is a compromise which starts out with a large chunk size (in effect scheduling part of the work static) then working with progressively decreasing chunk size, so as to continue keeping threads busy.
In some cases (e.g. working on triangular matrices), it's worth while to add an outer loop which iterates over the number of threads, using static scheduling, but explicitly balancing the work given to each thread.
Dynamic falls down with shared arrays on a NUMA platform, as there is no way to maintain data local to thread.
0 Kudos
Reply