- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Next step to "hello world" program on learning OpenMP, I started making the following code, such that I can implement on the bigger code which I work on. Unfortunately, I couldn't see any difference across normal, /Qopenmp and /Qparallel options. All uses only one processor out of 4, for any value of n. Of course, beyond certain value of n, virtual memory error is thrown.
Compilation was done in command line with following options using Fortran Compiler 12.1.5.344 in Intel i5-3320M processor (Win 7, 32-bit).
ifort test.f90
ifort /Qopenmp test.f90 (i see from report that, the do loop was parallelized)
ifort /Qparallel test.f90
Can someone help on, what am I missing?
[fortran]
module mod1
contains
subroutine sub1(a,b,c)
implicit none
real*4, intent(in) :: a,b
real*4, intent(out):: c
c=a*b
end subroutine sub1
end module mod1
program test
use omp_lib
use mod1
implicit none
real*4, allocatable :: a(:), b(:), c(:)
integer :: i,n
read(*,*) n
allocate(a(n),b(n),c(n))
do i=1,n
call random_number(a(i))
call random_number(b(i))
end do
!$omp do
do i=1,n
call sub1(a(i),b(i),c(i))
end do
!$omp end do
write(*,*) a(1),b(1),c(1)
end program test
[/fortran]
- Balises:
- Intel® Fortran Compiler
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
This is a useful example of using OpenMP, as even when it is running with !$omp implemented, it will demonstrate the limitations of this test approach. There are a number of stages of this test approach, being:
1: allocate the arrays with ALLOCATE
2: Initialise the arrays ( which also obtains memory from the memory pool )
3: !$omp perform the calculation using sub1
4: you could also have a stage to do thecalculation using vector calculations " c=a*b "
5: DEALLOCATE the arrays.
I would suggest you time the elapsed time for each of these stages. SYSTEM_CLOCK might work for large n.
This should demonstrate that for this test:
90% of the time is taken in stage 2 for allocating memory.
alternative approach 4 works best for this calculation.
You could include in the sub1 loop " write (*,*) 'loop',i,' thread', omp_get_thread_num ()"
This would demonstrate what !$omp is doing.
Use of !$omp works best where sub1 does a more significant calculation, with each call being independent and suits parallel computation.
John
ps: attached is a version of the test to record the times of the various stages, for different values of n. I carried out this test on a Xeon which supports 4 threads. The strength of OpenMP is where "sub1" does a lot of diverse calculation, such as running many independent models. The diversity of the calculation helps, as processing a single vector can result in a memory cache bottleneck. I hope this example helps.
Lien copié
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
As your parallel loop presumably auto-vectorizes when you don't set /Qopenmp, it would not be surprising if the change in performance there were small relative to the rest of your program.
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Would declaring a,b,c shared help ?
[fortran]!$omp parallel do shared (n,a,b,c), private (i)
do i=1,n
call sub1(a(i),b(i),c(i))
end do
!$omp end parallel do [/fortran]
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
"!$omp do ..." when issued outside of a parallel region is ~equivilent to a comment. Use either:
!$omp parallel do ... (as illustrated by John)
do ...
...
end do
!$omp end parallel do
or
!$omp parallel
{optional parallel code here}
!$omp do ...
do ...
...
end do
!$omp end do
{ optional parallel code here}
!$omp end parallel
Jim Dempsey
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
John Campbell wrote:
Would declaring a,b,c shared help ?
!$omp parallel do shared (n,a,b,c), private (i) do i=1,n call sub1(a(i),b(i),c(i)) end do !$omp end parallel do
Those all confirm defaults. Of course, as you show and Jim mentioned, the omission by OP of parallel would explain why the omp do had no effect. Examining opt-report output would certainly alert us.
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
This is a useful example of using OpenMP, as even when it is running with !$omp implemented, it will demonstrate the limitations of this test approach. There are a number of stages of this test approach, being:
1: allocate the arrays with ALLOCATE
2: Initialise the arrays ( which also obtains memory from the memory pool )
3: !$omp perform the calculation using sub1
4: you could also have a stage to do thecalculation using vector calculations " c=a*b "
5: DEALLOCATE the arrays.
I would suggest you time the elapsed time for each of these stages. SYSTEM_CLOCK might work for large n.
This should demonstrate that for this test:
90% of the time is taken in stage 2 for allocating memory.
alternative approach 4 works best for this calculation.
You could include in the sub1 loop " write (*,*) 'loop',i,' thread', omp_get_thread_num ()"
This would demonstrate what !$omp is doing.
Use of !$omp works best where sub1 does a more significant calculation, with each call being independent and suits parallel computation.
John
ps: attached is a version of the test to record the times of the various stages, for different values of n. I carried out this test on a Xeon which supports 4 threads. The strength of OpenMP is where "sub1" does a lot of diverse calculation, such as running many independent models. The diversity of the calculation helps, as processing a single vector can result in a memory cache bottleneck. I hope this example helps.
- Marquer comme nouveau
- Marquer
- S'abonner
- Sourdine
- S'abonner au fil RSS
- Surligner
- Imprimer
- Signaler un contenu inapproprié
Thank you Jim and John. !$omp parallel was the miss, and by including it, I see all 4 processors involved in my test code.
John - your example helped me to understand many things (allocation, calculation, what and what not to parallelize) about using OpenMP in one go. Special Thanks to you!!!
Just before completion EXE got crased due to insufficient virtual memory error (#41). May I request a suggestion to fix it?

- S'abonner au fil RSS
- Marquer le sujet comme nouveau
- Marquer le sujet comme lu
- Placer ce Sujet en tête de liste pour l'utilisateur actuel
- Marquer
- S'abonner
- Page imprimable