topic Re: Basic OpenMP question in Intel® Moderncode for Parallel Architectures

Basic OpenMP question

roine_vestman — Thu, 24 Sep 2009 19:18:12 GMT

I seem to have difficulties to utilize all 8 cores that I have access to on a cluster. Essentially, I have three nested loops. The two inner most loops can be worked on in parallel, so I have put an OpenMP statement between the first and the second. It looks as follows:

do ix=1,nx
!$OMP PARALELL DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END PARALELL DO
end do

Unfortunately, the code only seems to reach a load of ny (I check that with pbstop +l ), while I would like it reach a load of upto ny*nx. There is substantial amount of work to be made for each (iy,iz), so I was expecting a load far greater than ny. I have also noticed that the load is lower if I put the iz loop outside of the iy loop and if nz
Thank you for any hints and help.

Re: Basic OpenMP question

robert-reed — Fri, 25 Sep 2009 00:06:08 GMT

Quoting - roine.vestman@nyu.edu

I think what you're missing is a collapse statement. Perhaps the following will give you what you want:

do ix=1,nx
!$omp parallel do private(iy,iz) collapse(2)
do iy=1,ny
do iz=1,nz
...
end do
end do
!$omp end parallel do
end do

The description from the OpenMP 3.0 specification reads as follows:

The collapse clause may be used to specify how many loops are associated with the loop construct. ... If more than one loop is associated with the loop construct, then the iterations of all associated loops are collapsed into one larger iteration space which is then divided according to the schedule clause.

That should give you the span of tasks you are trying to achieve.

Re: Basic OpenMP question

roine_vestman — Tue, 29 Sep 2009 05:46:15 GMT

Thank you for the suggestion. I tried adding collapse(2) but can't get it to work. The error reads:

fortcom: Error: vfi_common_model_main.f90, line 360: Syntax error, found IDENTIFIER 'COLLAPSE' when expecting one of: PRIVATE FIRSTPRIVATE REDUCTION SHARED IF DEFAULT COPYIN NUM_THREADS LASTPRIVATE ...
!$OMP PARALLEL DO DEFAULT(NONE) COLLAPSE(2) &

From which version of Intel Fortan is 'collapse' supported (i.e. from which version is OpenMP 3.0 supported)? I have Intel Fortran version 10.0.023 - do I need to upgrade?

Re: Basic OpenMP question

Michael_K_Intel2 — Tue, 29 Sep 2009 08:18:06 GMT

Quoting - roine.vestman@nyu.edu

AFAIK you need to update to 11.x to gain support for OpenMP with Intel's compilers.

I would also suggest to rearrange your code the following way:

!$OMP PARALLEL
do ix=1,nx
!$OMP DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END DO
end do
!$OMP END PARALLEL

This saves frequently creating and tearing down the parallel region, as it wouldbe done with your code pattern. Creating/tearing down the parallel region can have a serious impact to performance because of administrative and barrier overhead.

Cheers,
-michael

Re: Basic OpenMP question

maria — Wed, 21 Oct 2009 16:58:43 GMT

Quoting - Michael Klemm, Intel

I am learning openMP. This is interesting point.If I have nested loops, I can apply '!$omp parallel' and '!$omp end paraellel' to the outermost do loop, and then apply '!$OMP do' and '!$OMP end do' at the loop level for parallel region. I just need to worry the data environment for the paraell region. Is this correct?

Re: Basic OpenMP question

jimdempseyatthecove — Wed, 21 Oct 2009 17:37:10 GMT

Michael,

add private(ix) clause

!$OMP PARALLEL PRIVATE(ix)
do ix=1,nx
!$OMP DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END DO
end do
!$OMP END PARALLEL

Ronie,

Please note that all threads execute each ix from 1 to nx however each has its own copy of ix.
Be cautious when adding code between the do ix and do iy loops. You can have code there buy you cannot assume each thread is working on different slice of ix iteration space.

Michael's suggestion is good in that it decreases the number of times to form a thread pool from nx times to 1 time. When you get the newer version of the compiler you can then consider adding Robert's suggestion of collaps(2). Until then you could use omp_get_num_threads and omp_get_thread_num in a single loop construction for iteration over iy and iz space.

Jim Dempsey

Re: Basic OpenMP question

mahmoudgalal1985 — Thu, 26 Nov 2009 14:25:16 GMT

Quoting - jimdempseyatthecove

Thank you

Re: Basic OpenMP question

mahmoudgalal1985 — Thu, 26 Nov 2009 14:25:47 GMT

Quoting - jimdempseyatthecove

Thanks