- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have three nested loops. The two inner most can be worked on in parallel, so I have set up an OpenMP statement between the first and the second. It looks as follows:
do ix=1,nx
!$OMP PARALELL DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END PARALELL DO
end do
Unfortunately, the code only seems to reach a load of ny (in Linux I check that with pbstop +l ), while I would like it reach a load of ny*nx. There is substantial amount of work to be made, so I was expecting a load far greater than ny. Among the compilation flags I have -unroll. What am I missing? Do I need to change the scheduling or is there something easier that can be done?
Thanks.
Roine
do ix=1,nx
!$OMP PARALELL DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END PARALELL DO
end do
Unfortunately, the code only seems to reach a load of ny (in Linux I check that with pbstop +l ), while I would like it reach a load of ny*nx. There is substantial amount of work to be made, so I was expecting a load far greater than ny. Among the compilation flags I have -unroll. What am I missing? Do I need to change the scheduling or is there something easier that can be done?
Thanks.
Roine
Link Copied
2 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - roine.vestman@nyu.edu
I have three nested loops. The two inner most can be worked on in parallel, so I have set up an OpenMP statement between the first and the second. It looks as follows:
do ix=1,nx
!$OMP PARALELL DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END PARALELL DO
end do
Unfortunately, the code only seems to reach a load of ny (in Linux I check that with pbstop +l ), while I would like it reach a load of ny*nx. There is substantial amount of work to be made, so I was expecting a load far greater than ny. Among the compilation flags I have -unroll. What am I missing? Do I need to change the scheduling or is there something easier that can be done?
Thanks.
Roine
do ix=1,nx
!$OMP PARALELL DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END PARALELL DO
end do
Unfortunately, the code only seems to reach a load of ny (in Linux I check that with pbstop +l ), while I would like it reach a load of ny*nx. There is substantial amount of work to be made, so I was expecting a load far greater than ny. Among the compilation flags I have -unroll. What am I missing? Do I need to change the scheduling or is there something easier that can be done?
Thanks.
Roine
I do not know what you are asking. From your code, I would expect the iterations of iy to be split amongst the available cores. Is this a performance problem that you see, are you expecting a better speedup? What are you seeing, and what does your code look like?
ron
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
While also to me it is not entirely clear what your question is, some observations:
(1) Did you also mis-spell the OpenMP directive in your code? It should be !$OMP PARALLEL DO ... If yes, the compiler may have done nothing to actually parallelize your code.
(2) If by load you actually mean the system load - well, that would not be determined by your loop parameters, but by the amount of resources you provide to the executable. So, by setting
export OMP_NUM_THREADS=3
before running, you could expect to reach a system load of 3 provided all threads compute in parallel at least most of the time. It is rarely useful to set the variable to a number larger than the number of cores available in the system.
(3) By default, the !$OMP PARALLEL DO will only parallelize the directly enclosed loop, in your case dividing up the iterations of index IY among the available threads. If NY is very small, you might indeed run into performance problems. One of these may simply be thread startup times, so using a structure like
!$OMP PARALLEL
do ix=1,nx
!$OMP DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END DO
end do
!$OMP END PARALLEL
may be helpful. If you actually wish to workshare both the IY and IZ loop levels, you could add a COLLAPSE(2) clause to the OMP DO directive; however this is OpenMP 3.0 so you'll need a recent compiler release for this to be accepted.
Regards
While also to me it is not entirely clear what your question is, some observations:
(1) Did you also mis-spell the OpenMP directive in your code? It should be !$OMP PARALLEL DO ... If yes, the compiler may have done nothing to actually parallelize your code.
(2) If by load you actually mean the system load - well, that would not be determined by your loop parameters, but by the amount of resources you provide to the executable. So, by setting
export OMP_NUM_THREADS=3
before running, you could expect to reach a system load of 3 provided all threads compute in parallel at least most of the time. It is rarely useful to set the variable to a number larger than the number of cores available in the system.
(3) By default, the !$OMP PARALLEL DO will only parallelize the directly enclosed loop, in your case dividing up the iterations of index IY among the available threads. If NY is very small, you might indeed run into performance problems. One of these may simply be thread startup times, so using a structure like
!$OMP PARALLEL
do ix=1,nx
!$OMP DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END DO
end do
!$OMP END PARALLEL
may be helpful. If you actually wish to workshare both the IY and IZ loop levels, you could add a COLLAPSE(2) clause to the OMP DO directive; however this is OpenMP 3.0 so you'll need a recent compiler release for this to be accepted.
Regards
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page