- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Dear all,
I have 12 threads, N>12 (for example 48) iterations, there is no dependence across iterations in the iloop. Here is my code:
!$OMP parallel default(shared) private(i)
!$omp do schedule(dynamic)
! here I also tried schedule (static)
iloop:do i=1,N
jloop: do j=1,K
...
enddo jloop
enddo iloop
!$omp end do nowait
! here I also tried without nowait
!$omp end parallel
For the first 12 iterations, they are doing jobs parallely. However, after the first 12 iterations, the job is done serially apparently. Anything wrong with my code?
I have 12 threads, N>12 (for example 48) iterations, there is no dependence across iterations in the iloop. Here is my code:
!$OMP parallel default(shared) private(i)
!$omp do schedule(dynamic)
! here I also tried schedule (static)
iloop:do i=1,N
jloop: do j=1,K
...
enddo jloop
enddo iloop
!$omp end do nowait
! here I also tried without nowait
!$omp end parallel
For the first 12 iterations, they are doing jobs parallely. However, after the first 12 iterations, the job is done serially apparently. Anything wrong with my code?
コピーされたリンク
5 返答(返信)
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
schedule(dynamic) would assign each of the first 12 iterations to a separate thread. As each thread finishes an iteration, it will be assigned to the next iteration. This is likely to be somewhat inefficient, due to the lack of locality, as well as the extra run time processing, but all threads should remain active until there are no fresh iterations to be started.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Judging from the speed of the post 12 iterations, I think only one thread is working at a time.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Is loop i an iterative loop for making test runs or a distribution (slicing) loop?
If used to increase test runs then the do j loop would be the one you parallize.
!$OMP parallel default(shared) private(i)
!$omp do schedule(dynamic)
iloop:do i=1,N
!$OMP CRITICAL
write(*,*) i, omp_get_thread_num()
!$OMP END CRITICAL
jloop: do j=1,K
...
enddo jloop
enddo iloop
!$omp end do
!$omp end parallel
Which threads to which iteration?
Does choice of i vary the amount of computation?
Jim Dempsey
If used to increase test runs then the do j loop would be the one you parallize.
!$OMP parallel default(shared) private(i)
!$omp do schedule(dynamic)
iloop:do i=1,N
!$OMP CRITICAL
write(*,*) i, omp_get_thread_num()
!$OMP END CRITICAL
jloop: do j=1,K
...
enddo jloop
enddo iloop
!$omp end do
!$omp end parallel
Which threads to which iteration?
Does choice of i vary the amount of computation?
Jim Dempsey
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
I am so sorry for the mistake I made. There was one variable that should be threadprivate, but was made to be shared, some threads were made to wait when they got a wrong value of that variable. Now it works.
But as a general question, when should I use dynamic, static, or guided in the schedule. I read some instructions on this issue, but too abstract.
For example, if the choice of i does not the amount of computation, is it true that it does not matter? if computation burdens do changes across iterations, then dynammic is better?
Thanks a lot.
But as a general question, when should I use dynamic, static, or guided in the schedule. I read some instructions on this issue, but too abstract.
For example, if the choice of i does not the amount of computation, is it true that it does not matter? if computation burdens do changes across iterations, then dynammic is better?
Thanks a lot.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
static is usually best when dividing the loop iterations evenly among threads, in combination with correct affinity settings, when each loop iteration has a similar amount of work.
dynamic often works best with a greater than unity chunk size (set experimentally).
guided is a compromise which starts out with a large chunk size (in effect scheduling part of the work static) then working with progressively decreasing chunk size, so as to continue keeping threads busy.
In some cases (e.g. working on triangular matrices), it's worth while to add an outer loop which iterates over the number of threads, using static scheduling, but explicitly balancing the work given to each thread.
Dynamic falls down with shared arrays on a NUMA platform, as there is no way to maintain data local to thread.
dynamic often works best with a greater than unity chunk size (set experimentally).
guided is a compromise which starts out with a large chunk size (in effect scheduling part of the work static) then working with progressively decreasing chunk size, so as to continue keeping threads busy.
In some cases (e.g. working on triangular matrices), it's worth while to add an outer loop which iterates over the number of threads, using static scheduling, but explicitly balancing the work given to each thread.
Dynamic falls down with shared arrays on a NUMA platform, as there is no way to maintain data local to thread.
