topic Re: OpenMP no speedup in Intel® Fortran Compiler

OpenMP no speedup

misty12 — Mon, 02 Jun 2008 20:31:34 GMT

Hi,
I'm trying to parallelize the following cycle:
...
converged = .false.
r3=gam(2)*2.0
do while (converged .ne. .true.)
!===========================================
! This is parallel block #1
!$omp parallel private (t1,t2,s1,s2,t3,t4,t5,t6,q1,q2,r1,r2,r11,r22)
!$omp do
do i=0,nr-1
do j=0,nt-1

t1=ru(j,i,1)+run(j,i,1)
t5=ru(j,i,1)*ru(j,i,1)+run(j,i,1)*run(j,i,1)
t2=iu(j,i,1)+iun(j,i,1)
t6=iu(j,i,1)*iu(j,i,1)+iun(j,i,1)*iun(j,i,1)
q1=-gam(2)*(t5-t6)
r1=t5+t6
q2=r3*(iu(j,i,1)*ru(j,i,1)+iun(j,i,1)*run(j,i,1))
s1=ru(j,i,2)+run(j,i,2)
r2=ru(j,i,2)*ru(j,i,2)+run(j,i,2)*run(j,i,2)
s2=iu(j,i,2)+iun(j,i,2)
r2=r2+iu(j,i,2)*iu(j,i,2)+iun(j,i,2)*iun(j,i,2)
t3=gam(1)*(t1*s2-t2*s1)
t4=-gam(1)*(t1*s1+t2*s2)
r11=(r1+beta1*r2)*alf(1)
r22=(beta1*r1+r2)*alf(2)

fru(j,i,1)=t3+r11*t2
fiu(j,i,1)=t4-r11*t1
fru(j,i,2)=q2+r22*s2
fiu(j,i,2)=q1-r22*s1

end do
end do
!$omp end do nowait
!$omp end parallel
! End of parallel block #1
!===== ======================================
!
....
!
....
end do
...
Arrays are declared as follows:
double precision, allocatable, dimension(:,:,:):: ru,run,fru
double precision, allocatable, dimension(:,:,:):: iu,iun,fiun
allocate(ru(0:nt,0:nr,2),run(0:nt,0:nr,2),fru(0:nt,0:nr,2))
allocate(iu(0:nt,0:nr,2),iun(0:nt,0:nr,2),fiu(0:nt,0:nr,2))

nt=2048, nr=250

To estimate the speedup I've created to activities of Thread Profiler with number of threads equal to 1 and 2. The results of runs of these activities show that I have absolutely no speedup for the parallel block #1: 21sec (in case of 1 thread) and 20.9 sec (in case of 2 threads), while for the parallel block #2 speedup rate is more than 1.6. Am I doing smth wrong in the first parallel block?

Thanks in advance

Re: OpenMP no speedup

jimdempseyatthecove — Wed, 04 Jun 2008 12:04:36 GMT

Try

!$omp parallel do private (t1,t2,s1,s2,t3,t4,t5,t6,q1,q2,r1,r2,r11,r22) schedule(static,1)
 do i=0,nr-1
 do j=0,nt-1
 ...
 end do
 end do
!$omp end parallel do

Jim Dempsey

Re: OpenMP no speedup

Steve_Nuchia — Mon, 09 Jun 2008 15:17:09 GMT

How many times does the first loop construct execute? What is the average wall time per pass? There issubstantial overhead in the thread management and the convergence loop will incur that overhead on every pass.

It's more work but you might try creating your thread pool outside the convergence loop.

Re: OpenMP no speedup

Steve_Nuchia — Tue, 10 Jun 2008 14:31:13 GMT

Another point: calculate the total memory bandwidth of the calculation and compare it to the memory bandwidth of your system. If it is saturating the memory controller and/or the cache <-> register data paths with one thread it will run in pretty much the same time with more threads.

Re: OpenMP no speedup

Steve_Nuchia — Tue, 10 Jun 2008 14:32:39 GMT

make that shared cache <-> private cache data paths.