- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a do loop, shown below, that takes 2 seconds for execution. Surprisingly, when I compile with -openmp, it takes 3 seconds !!!! I'm using on my job script 6 processors and 6 threads.
Would anyone know the reason ? and how long the run should take in ideal case for different counts of processors and threads.
Thank you.
!$OMP PARALLEL DO PRIVATE(i,k,l,PHIW,PHIO,TU1,TV1)
DO i=2,NZETA+1
DO k=1,(N/2)+1
DO l=1,LE
TU1=-(0,1)*(2*(k-1)*GAMMA*(2*V(i,k,l)-VO(i,k,l))/r(i)**2)
TV1=(0,1)*(2*(k-1)*GAMMA*(2*U(i,k,l)-UO(i,k,l))/r(i)**2)
IF (t.EQ.1) THEN
TU1=-(0,1)*(2*(k-1)*GAMMA*V(i,k,l)/r(i)**2)
TV1=(0,1)*(2*(k-1)*GAMMA*U(i,k,l)/r(i)**2)
END IF
au(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
bu(i,k,l)=1+GAMMA*(((1+(k-1)**2)/r(i)**2)+S**2*(l-1)**2+(2/(r(i)**2*DZETA**2)))
cu(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
du(i,k,l)=U1(i,k,l)+TU1
av(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
bv(i,k,l)=1+GAMMA*(((1+(k-1)**2)/r(i)**2)+S**2*(l-1)**2+(2/(r(i)**2*DZETA**2)))
cv(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
dv(i,k,l)=V1(i,k,l)+TV1
aw(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
bw(i,k,l)=1+GAMMA*((((k-1)**2)/r(i)**2)+S**2*(l-1)**2+(2/(r(i)**2*DZETA**2)))
cw(i,k,l)=-GAMMA/(r(i)**2*DZETA**2)
dw(i,k,l)=W1(i,k,l)
UO(i,k,l)=U(i,k,l)
VO(i,k,l)=V(i,k,l)
WO(i,k,l)=W(i,k,l)
IF (i.eq.NZETA+1) THEN
au(i,k,l)=0
bu(i,k,l)=1+GAMMA*(((1+(k-1)**2)/r(i)**2)+S**2*(l-1)**2)
cu(i,k,l)=0
du(i,k,l)=U1(i,k,l)+TU1
av(i,k,l)=0
bv(i,k,l)=1+GAMMA*(((1+(k-1)**2)/r(i)**2)+S**2*(l-1)**2+(1/r(i)))
cv(i,k,l)=0
dv(i,k,l)=V1(i,k,l)+TV1
aw(i,k,l)=0
bw(i,k,l)=1+GAMMA*((((k-1)**2)/r(i)**2)+S**2*(l-1)**2)
cw(i,k,l)=0
UO(i,k,l)=U(i,k,l)
VO(i,k,l)=V(i,k,l)
WO(i,k,l)=W(i,k,l)
END IF
IF (i.EQ.2) Then
cu(i,k,l)=0
cv(i,k,l)=0
cw(i,k,l)=0
END IF
END DO
END DO
END DO
!$OMP END PARALLEL DO
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What are the values of NZETA, N and LE?
If NZETA = 1 then only one thread will work, if NZETA < 6 then not all threads will work
If LE is large then consider swapping the order of DO i and DO l as you will get better performance having the outer loop vary the right most index... provided it can be split up across the number of threads available.
Then after swapping loop order consider adding the COLLAPSE clause (available in newer versions of IVF)
!$OMP PARALLEL DO PRIVATE(i,k,l,PHIW,PHIO,TU1,TV1) COLLAPSE(3)
DO l=1,LE ! note change in order to go right to left on indexes
DO k=1,(N/2)+1
DO i=2,NZETA+1
And PHIW,PHIO are not used.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page