OMP Takes Longer Time than Serial Code !!!

ash1 · ‎10-14-2009

Hello everyone,

I have a do loop, shown below, that takes 2 seconds for execution. Surprisingly, when I compile with -openmp, it takes 3 seconds !!!! I'm using on my job script 6 processors and 6 threads.

Would anyone know the reason ? and how long the run should take in ideal case for different counts of processors and threads.

Thank you.

!$OMP PARALLEL DO PRIVATE(i,k,l,PHIW,PHIO,TU1,TV1)

DO i=2,NZETA+1
DO k=1,(N/2)+1
DO l=1,LE
TU1=-(0,1)*(2*(k-1)*GAMMA*(2*V(i,k,l)-VO(i,k,l))/r(i)**2)
TV1=(0,1)*(2*(k-1)*GAMMA*(2*U(i,k,l)-UO(i,k,l))/r(i)**2)
IF (t.EQ.1) THEN
TU1=-(0,1)*(2*(k-1)*GAMMA*V(i,k,l)/r(i)**2)
TV1=(0,1)*(2*(k-1)*GAMMA*U(i,k,l)/r(i)**2)
END IF
au(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
bu(i,k,l)=1+GAMMA*(((1+(k-1)**2)/r(i)**2)+S**2*(l-1)**2+(2/(r(i)**2*DZETA**2)))
cu(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
du(i,k,l)=U1(i,k,l)+TU1
av(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
bv(i,k,l)=1+GAMMA*(((1+(k-1)**2)/r(i)**2)+S**2*(l-1)**2+(2/(r(i)**2*DZETA**2)))
cv(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
dv(i,k,l)=V1(i,k,l)+TV1
aw(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
bw(i,k,l)=1+GAMMA*((((k-1)**2)/r(i)**2)+S**2*(l-1)**2+(2/(r(i)**2*DZETA**2)))
cw(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
dw(i,k,l)=W1(i,k,l)
UO(i,k,l)=U(i,k,l)
VO(i,k,l)=V(i,k,l)
WO(i,k,l)=W(i,k,l)
IF (i.eq.NZETA+1) THEN
au(i,k,l)=0
bu(i,k,l)=1+GAMMA*(((1+(k-1)**2)/r(i)**2)+S**2*(l-1)**2)
cu(i,k,l)=0
du(i,k,l)=U1(i,k,l)+TU1
av(i,k,l)=0
bv(i,k,l)=1+GAMMA*(((1+(k-1)**2)/r(i)**2)+S**2*(l-1)**2+(1/r(i)))
cv(i,k,l)=0
dv(i,k,l)=V1(i,k,l)+TV1
aw(i,k,l)=0
bw(i,k,l)=1+GAMMA*((((k-1)**2)/r(i)**2)+S**2*(l-1)**2)
cw(i,k,l)=0
UO(i,k,l)=U(i,k,l)
VO(i,k,l)=V(i,k,l)
WO(i,k,l)=W(i,k,l)
END IF
IF (i.EQ.2) Then
cu(i,k,l)=0
cv(i,k,l)=0
cw(i,k,l)=0
END IF
END DO
END DO
END DO
!$OMP END PARALLEL DO

jimdempseyatthecove · ‎10-15-2009

What are the values of NZETA, N and LE?

If NZETA = 1 then only one thread will work, if NZETA < 6 then not all threads will work
If LE is large then consider swapping the order of DO i and DO l as you will get better performance having the outer loop vary the right most index... provided it can be split up across the number of threads available.
Then after swapping loop order consider adding the COLLAPSE clause (available in newer versions of IVF)

!$OMP PARALLEL DO PRIVATE(i,k,l,PHIW,PHIO,TU1,TV1) COLLAPSE(3)
DO l=1,LE ! note change in order to go right to left on indexes
DO k=1,(N/2)+1
DO i=2,NZETA+1

And PHIW,PHIO are not used.

Jim Dempsey

TimP · ‎10-15-2009

In addition to what Jim said, the compiler may attempt some optimization of your loop nesting only when -openmp isn't set. When -openmp is set, your specified order of loop nesting is taken literally, in spite of the potential for false sharing.