- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I seem to have difficulties to utilize all 8 cores that I have access to on a cluster. Essentially, I have three nested loops. The two inner most loops can be worked on in parallel, so I have put an OpenMP statement between the first and the second. It looks as follows:
do ix=1,nx
!$OMP PARALELL DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END PARALELL DO
end do
Unfortunately, the code only seems to reach a load of ny (I check that with pbstop +l ), while I would like it reach a load of upto ny*nx. There is substantial amount of work to be made for each (iy,iz), so I was expecting a load far greater than ny. I have also noticed that the load is lower if I put the iz loop outside of the iy loop and if nz
Thank you for any hints and help.
do ix=1,nx
!$OMP PARALELL DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END PARALELL DO
end do
Unfortunately, the code only seems to reach a load of ny (I check that with pbstop +l ), while I would like it reach a load of upto ny*nx. There is substantial amount of work to be made for each (iy,iz), so I was expecting a load far greater than ny. I have also noticed that the load is lower if I put the iz loop outside of the iy loop and if nz
Thank you for any hints and help.
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - roine.vestman@nyu.edu
I seem to have difficulties to utilize all 8 cores that I have access to on a cluster. Essentially, I have three nested loops. The two inner most loops can be worked on in parallel, so I have put an OpenMP statement between the first and the second. It looks as follows:
do ix=1,nx
!$OMP PARALELL DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END PARALELL DO
end do
Unfortunately, the code only seems to reach a load of ny (I check that with pbstop +l ), while I would like it reach a load of upto ny*nx. There is substantial amount of work to be made for each (iy,iz), so I was expecting a load far greater than ny. I have also noticed that the load is lower if I put the iz loop outside of the iy loop and if nz
do ix=1,nx
!$OMP PARALELL DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END PARALELL DO
end do
Unfortunately, the code only seems to reach a load of ny (I check that with pbstop +l ), while I would like it reach a load of upto ny*nx. There is substantial amount of work to be made for each (iy,iz), so I was expecting a load far greater than ny. I have also noticed that the load is lower if I put the iz loop outside of the iy loop and if nz
I think what you're missing is a collapse statement. Perhaps the following will give you what you want:
do ix=1,nx
!$omp parallel do private(iy,iz) collapse(2)
do iy=1,ny
do iz=1,nz
...
end do
end do
!$omp end parallel do
end do
!$omp parallel do private(iy,iz) collapse(2)
do iy=1,ny
do iz=1,nz
...
end do
end do
!$omp end parallel do
end do
The description from the OpenMP 3.0 specification reads as follows:
The collapse clause may be used to specify how many loops are associated with the loop construct. ... If more than one loop is associated with the loop construct, then the iterations of all associated loops are collapsed into one larger iteration space which is then divided according to the schedule clause.
That should give you the span of tasks you are trying to achieve.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the suggestion. I tried adding collapse(2) but can't get it to work. The error reads:
fortcom: Error: vfi_common_model_main.f90, line 360: Syntax error, found IDENTIFIER 'COLLAPSE' when expecting one of: PRIVATE FIRSTPRIVATE REDUCTION SHARED IF DEFAULT COPYIN NUM_THREADS LASTPRIVATE ...
!$OMP PARALLEL DO DEFAULT(NONE) COLLAPSE(2) &
From which version of Intel Fortan is 'collapse' supported (i.e. from which version is OpenMP 3.0 supported)? I have Intel Fortran version 10.0.023 - do I need to upgrade?
fortcom: Error: vfi_common_model_main.f90, line 360: Syntax error, found IDENTIFIER 'COLLAPSE' when expecting one of: PRIVATE FIRSTPRIVATE REDUCTION SHARED IF DEFAULT COPYIN NUM_THREADS LASTPRIVATE ...
!$OMP PARALLEL DO DEFAULT(NONE) COLLAPSE(2) &
From which version of Intel Fortan is 'collapse' supported (i.e. from which version is OpenMP 3.0 supported)? I have Intel Fortran version 10.0.023 - do I need to upgrade?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - roine.vestman@nyu.edu
Thank you for the suggestion. I tried adding collapse(2) but can't get it to work. The error reads:
fortcom: Error: vfi_common_model_main.f90, line 360: Syntax error, found IDENTIFIER 'COLLAPSE' when expecting one of: PRIVATE FIRSTPRIVATE REDUCTION SHARED IF DEFAULT COPYIN NUM_THREADS LASTPRIVATE ...
!$OMP PARALLEL DO DEFAULT(NONE) COLLAPSE(2) &
From which version of Intel Fortan is 'collapse' supported (i.e. from which version is OpenMP 3.0 supported)? I have Intel Fortran version 10.0.023 - do I need to upgrade?
fortcom: Error: vfi_common_model_main.f90, line 360: Syntax error, found IDENTIFIER 'COLLAPSE' when expecting one of: PRIVATE FIRSTPRIVATE REDUCTION SHARED IF DEFAULT COPYIN NUM_THREADS LASTPRIVATE ...
!$OMP PARALLEL DO DEFAULT(NONE) COLLAPSE(2) &
From which version of Intel Fortan is 'collapse' supported (i.e. from which version is OpenMP 3.0 supported)? I have Intel Fortran version 10.0.023 - do I need to upgrade?
AFAIK you need to update to 11.x to gain support for OpenMP with Intel's compilers.
I would also suggest to rearrange your code the following way:
!$OMP PARALLEL
do ix=1,nx
!$OMP DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END DO
end do
!$OMP END PARALLEL
This saves frequently creating and tearing down the parallel region, as it wouldbe done with your code pattern. Creating/tearing down the parallel region can have a serious impact to performance because of administrative and barrier overhead.
Cheers,
-michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Michael Klemm, Intel
AFAIK you need to update to 11.x to gain support for OpenMP with Intel's compilers.
I would also suggest to rearrange your code the following way:
!$OMP PARALLEL
do ix=1,nx
!$OMP DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END DO
end do
!$OMP END PARALLEL
This saves frequently creating and tearing down the parallel region, as it wouldbe done with your code pattern. Creating/tearing down the parallel region can have a serious impact to performance because of administrative and barrier overhead.
Cheers,
-michael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Michael,
add private(ix) clause
!$OMP PARALLEL PRIVATE(ix)
do ix=1,nx
!$OMP DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END DO
end do
!$OMP END PARALLEL
Ronie,
Please note that all threads execute each ix from 1 to nx however each has its own copy of ix.
Be cautious when adding code between the do ix and do iy loops. You can have code there buy you cannot assume each thread is working on different slice of ix iteration space.
Michael's suggestion is good in that it decreases the number of times to form a thread pool from nx times to 1 time. When you get the newer version of the compiler you can then consider adding Robert's suggestion of collaps(2). Until then you could use omp_get_num_threads and omp_get_thread_num in a single loop construction for iteration over iy and iz space.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
Michael,
add private(ix) clause
!$OMP PARALLEL PRIVATE(ix)
do ix=1,nx
!$OMP DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END DO
end do
!$OMP END PARALLEL
Ronie,
Please note that all threads execute each ix from 1 to nx however each has its own copy of ix.
Be cautious when adding code between the do ix and do iy loops. You can have code there buy you cannot assume each thread is working on different slice of ix iteration space.
Michael's suggestion is good in that it decreases the number of times to form a thread pool from nx times to 1 time. When you get the newer version of the compiler you can then consider adding Robert's suggestion of collaps(2). Until then you could use omp_get_num_threads and omp_get_thread_num in a single loop construction for iteration over iy and iz space.
Jim Dempsey
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - jimdempseyatthecove
Michael,
add private(ix) clause
!$OMP PARALLEL PRIVATE(ix)
do ix=1,nx
!$OMP DO ....
do iy=1,ny
do iz=1,nz
...
end do
end do
!$OMP END DO
end do
!$OMP END PARALLEL
Ronie,
Please note that all threads execute each ix from 1 to nx however each has its own copy of ix.
Be cautious when adding code between the do ix and do iy loops. You can have code there buy you cannot assume each thread is working on different slice of ix iteration space.
Michael's suggestion is good in that it decreases the number of times to form a thread pool from nx times to 1 time. When you get the newer version of the compiler you can then consider adding Robert's suggestion of collaps(2). Until then you could use omp_get_num_threads and omp_get_thread_num in a single loop construction for iteration over iy and iz space.
Jim Dempsey
Thanks
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page