Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Basic OpenMP question

roine_vestman
Beginner
537 Views
I have three nested loops. The two inner most can be worked on in parallel, so I have set up an OpenMP statement between the first and the second. It looks as follows:

do ix=1,nx
!$OMP PARALELL DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END PARALELL DO
end do

Unfortunately, the code only seems to reach a load of ny (in Linux I check that with pbstop +l ), while I would like it reach a load of ny*nx. There is substantial amount of work to be made, so I was expecting a load far greater than ny. Among the compilation flags I have -unroll. What am I missing? Do I need to change the scheduling or is there something easier that can be done?

Thanks.

Roine
0 Kudos
2 Replies
Ron_Green
Moderator
537 Views
I have three nested loops. The two inner most can be worked on in parallel, so I have set up an OpenMP statement between the first and the second. It looks as follows:

do ix=1,nx
!$OMP PARALELL DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END PARALELL DO
end do

Unfortunately, the code only seems to reach a load of ny (in Linux I check that with pbstop +l ), while I would like it reach a load of ny*nx. There is substantial amount of work to be made, so I was expecting a load far greater than ny. Among the compilation flags I have -unroll. What am I missing? Do I need to change the scheduling or is there something easier that can be done?

Thanks.

Roine

I do not know what you are asking. From your code, I would expect the iterations of iy to be split amongst the available cores. Is this a performance problem that you see, are you expecting a better speedup? What are you seeing, and what does your code look like?

ron
0 Kudos
reinhold-bader
New Contributor II
538 Views
Hello,

While also to me it is not entirely clear what your question is, some observations:

(1) Did you also mis-spell the OpenMP directive in your code? It should be !$OMP PARALLEL DO ... If yes, the compiler may have done nothing to actually parallelize your code.

(2) If by load you actually mean the system load - well, that would not be determined by your loop parameters, but by the amount of resources you provide to the executable. So, by setting
export OMP_NUM_THREADS=3
before running, you could expect to reach a system load of 3 provided all threads compute in parallel at least most of the time. It is rarely useful to set the variable to a number larger than the number of cores available in the system.

(3) By default, the !$OMP PARALLEL DO will only parallelize the directly enclosed loop, in your case dividing up the iterations of index IY among the available threads. If NY is very small, you might indeed run into performance problems. One of these may simply be thread startup times, so using a structure like

!$OMP PARALLEL
do ix=1,nx
!$OMP DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END DO
end do
!$OMP END PARALLEL

may be helpful. If you actually wish to workshare both the IY and IZ loop levels, you could add a COLLAPSE(2) clause to the OMP DO directive; however this is OpenMP 3.0 so you'll need a recent compiler release for this to be accepted.

Regards
0 Kudos
Reply