Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29487 Discussions

OMP Takes Longer Time than Serial Code !!!

ash1
Beginner
1,176 Views
Hello everyone,

I have a do loop, shown below, that takes 2 seconds for execution. Surprisingly, when I compile with -openmp, it takes 3 seconds !!!! I'm using on my job script 6 processors and 6 threads.

Would anyone know the reason ? and how long the run should take in ideal case for different counts of processors and threads.

Thank you.

!$OMP PARALLEL DO PRIVATE(i,k,l,PHIW,PHIO,TU1,TV1)

DO i=2,NZETA+1
DO k=1,(N/2)+1
DO l=1,LE
TU1=-(0,1)*(2*(k-1)*GAMMA*(2*V(i,k,l)-VO(i,k,l))/r(i)**2)
TV1=(0,1)*(2*(k-1)*GAMMA*(2*U(i,k,l)-UO(i,k,l))/r(i)**2)
IF (t.EQ.1) THEN
TU1=-(0,1)*(2*(k-1)*GAMMA*V(i,k,l)/r(i)**2)
TV1=(0,1)*(2*(k-1)*GAMMA*U(i,k,l)/r(i)**2)
END IF
au(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
bu(i,k,l)=1+GAMMA*(((1+(k-1)**2)/r(i)**2)+S**2*(l-1)**2+(2/(r(i)**2*DZETA**2)))
cu(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
du(i,k,l)=U1(i,k,l)+TU1
av(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
bv(i,k,l)=1+GAMMA*(((1+(k-1)**2)/r(i)**2)+S**2*(l-1)**2+(2/(r(i)**2*DZETA**2)))
cv(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
dv(i,k,l)=V1(i,k,l)+TV1
aw(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
bw(i,k,l)=1+GAMMA*((((k-1)**2)/r(i)**2)+S**2*(l-1)**2+(2/(r(i)**2*DZETA**2)))
cw(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
dw(i,k,l)=W1(i,k,l)
UO(i,k,l)=U(i,k,l)
VO(i,k,l)=V(i,k,l)
WO(i,k,l)=W(i,k,l)
IF (i.eq.NZETA+1) THEN
au(i,k,l)=0
bu(i,k,l)=1+GAMMA*(((1+(k-1)**2)/r(i)**2)+S**2*(l-1)**2)
cu(i,k,l)=0
du(i,k,l)=U1(i,k,l)+TU1
av(i,k,l)=0
bv(i,k,l)=1+GAMMA*(((1+(k-1)**2)/r(i)**2)+S**2*(l-1)**2+(1/r(i)))
cv(i,k,l)=0
dv(i,k,l)=V1(i,k,l)+TV1
aw(i,k,l)=0
bw(i,k,l)=1+GAMMA*((((k-1)**2)/r(i)**2)+S**2*(l-1)**2)
cw(i,k,l)=0
UO(i,k,l)=U(i,k,l)
VO(i,k,l)=V(i,k,l)
WO(i,k,l)=W(i,k,l)
END IF
IF (i.EQ.2) Then
cu(i,k,l)=0
cv(i,k,l)=0
cw(i,k,l)=0
END IF
END DO
END DO
END DO
!$OMP END PARALLEL DO

0 Kudos
2 Replies
jimdempseyatthecove
Honored Contributor III
1,176 Views

What are the values of NZETA, N and LE?

If NZETA = 1 then only one thread will work, if NZETA < 6 then not all threads will work
If LE is large then consider swapping the order of DO i and DO l as you will get better performance having the outer loop vary the right most index... provided it can be split up across the number of threads available.
Then after swapping loop order consider adding the COLLAPSE clause (available in newer versions of IVF)

!$OMP PARALLEL DO PRIVATE(i,k,l,PHIW,PHIO,TU1,TV1) COLLAPSE(3)
DO l=1,LE ! note change in order to go right to left on indexes
DO k=1,(N/2)+1
DO i=2,NZETA+1


And PHIW,PHIO are not used.

Jim Dempsey

0 Kudos
TimP
Honored Contributor III
1,176 Views
In addition to what Jim said, the compiler may attempt some optimization of your loop nesting only when -openmp isn't set. When -openmp is set, your specified order of loop nesting is taken literally, in spite of the potential for false sharing.
0 Kudos
Reply