Re: openmp problem

roddur · ‎12-01-2008

This is a part of my openmp code.

400 !$ call OMP_SET_NUM_THREADS(2)
401
402 !$OMP PARALLEL DO DEFAULT(NONE) &
403 !$OMP PRIVATE(il,ie,iie,ienrp,iend,e,istart) &
404 !$OMP PRIVATE(temp,spec,xa,ya) &
405 !$OMP SHARED(ap1,ap2,ap3,ap4,ap6,ap7) &
406 !$OMP SHARED(ap8,ap9,ap10,ap11,ap12,ap13) &
407 !$OMP SHARED(MAP,SRL,seed,de,xxa,yya)
408
409 orbital: do il=1,lorbit-2,2
410 ! xa=0.0;ya=0.0
411 ! write(*,'(1x,"Starting orbital loop",1x,i1,$)') il
412 ienrp=0
413 e=emin-de
414 istart=1
415
416 do iend=9,63,lorbit
417 e=e+de
418 iie=0
419
420
421 ienrp=ienrp+1
422 call hop(il,e,ienrp,map,srl,ap1, &
423 ap6,ap7,ap8,ap9,ap10,ap11, &
424 ap12,ap13,ap2,ap3,ap4,xa,ya)
425
426 ! write(*,*) xa(2,3)
427 ! istart=iend+1
428 write(*,'(".",$)')
429 end do
430
431 call fit(xa,ya,seed,xxa,yya,temp)
432 call spectral(il,xxa,yya,temp,spec)
433 write(*,'("done")')
434
435
436 end do orbital
437 !$OMP END PARALLEL DO

The problem is there are some datarace as checked with tchecker,it gives

1 Intel Thread Checker 3.1 command line instrumentation driver (26185)
2 Copyright (c) 2007 Intel Corporation. All rights reserved.
3 _______________________________________________________________________________
4 |ID|Short Description|Severi|Co|Cont|Description |1stA|2ndA|
5 | | |ty |un|ext[| |cces|cces|
6 | | |Name |t |Best| |s[Be|s[Be|
7 | | | | |] | |st] |st] |
8 _______________________________________________________________________________
9 |1 |I/O data-race |Erro r |6 |irun|I/O operation at irun conflicts|irun|irun|
10 | | | | | |with a prior I/O operation at | | |
11 | | | | | |irun | | |
12 _______________________________________________________________________________
13 |2 |OpenMP* -- |Warnin|90|irun|OpenMP* -- undefined in the |irun|irun|
14 | |undefined in the |g |1 | |serial code (original program) | | |
15 | |serial code | | | |at irun with irun | | |
16 | |(original | | | | | | |
17 | |program) | | | | | | |
18 _______________________________________________________________________________
19 |3 |OpenMP* -- |Warnin|90|irun|OpenMP* -- undefined in the |irun|irun|
20 | |undefined in the |g |1 | |serial code (original program) | | |
21 | |serial code | | | |at irun with irun | | |
22 | |(original | | | | | | |
23 | |program) | | | | | | |
24 _______________________________________________________________________________
25 |4 |Thread |Inform|1 |Whol|Thread termination at irun - |irun|irun|
26 | |termination |ation | |e Pr|includes stack allocation of 10| | |
27 | | | | |ogra|MB and use of 5.228 MB | | |
28 | | | | |m 1 | | | |
29 _______________________________________________________________________________
30
the code runs fine with one thread, but the subroutines called are serial routine. So is there any way to make the code run like the lorbital loop is threaded, but the inner do loop is not....am i clear enough?

waiting for your reply

TimP · ‎12-01-2008

Did you compile all of your source code with -tcheck, just as the release build would require it all to be built with -openmp (or compatible subset? If you mean you have intentionally non thread-safe code called in a parallel region, you know the consequences.

roddur · ‎12-01-2008

Quoting - tim18

Did you compile all of your source code with -tcheck, just as the release build would require it all to be built with -openmp (or compatible subset? If you mean you have intentionally non thread-safe code called in a parallel region, you know the consequences.

should i compile all , even serial codes with -openmp option?

TimP · ‎12-01-2008

Quoting - roddur

should i compile all , even serial codes with -openmp option?

All functions which will be called in a parallel region should be compiled with -openmp. Other options may have the same effect, but you must avoid the default which makes local arrays and structs static. If the function is not called in a parallel region, -openmp will not hurt, except for small issues like increased stack or heap usage.

roddur · ‎12-01-2008

Is there any easier/cheaper way of making the subroutines threadsafe...eg using compile time options like reentrancy etc?

TimP · ‎12-02-2008

Quoting - roddur

Is there any easier/cheaper way of making the subroutines threadsafe...eg using compile time options like reentrancy etc?

I don't see anything easier about that. It has the same effect in subroutines with no OpenMP directives.

roddur · ‎12-05-2008

i have read from a SUN tutorial that we can use a function which is thread unsafe using lock. as all my suroutine inside the parallel region is developed for serial process, and thread unsafe, can i lock the subroutines? can anyone help me about that? is there any readme available?

TimP · ‎12-05-2008

Quoting - roddur

i have read from a SUN tutorial that we can use a function which is thread unsafe using lock. as all my suroutine inside the parallel region is developed for serial process, and thread unsafe, can i lock the subroutines? can anyone help me about that? is there any readme available?

Yes, you can place omp critical ..... end critical around a non thread-safe region, thus making a region which can be entered by only 1 thread at a time.

roddur · ‎12-05-2008

Quoting - tim18

Yes, you can place omp critical ..... end critical around a non thread-safe region, thus making a region which can be entered by only 1 thread at a time.

but can critical construction can be placed inside parallel do....end parallel do construction?

jimdempseyatthecove · ‎12-05-2008

Examine hop, fit and spectral (and lower level routines they call) to see if they can be made thread safe.

Can you post hop?

Jim Dempsey

roddur · ‎12-05-2008

Quoting - jimdempseyatthecove

Examine hop, fit and spectral (and lower level routines they call) to see if they can be made thread safe.

Can you post hop?

Jim Dempsey

this is a big routine....but still i am posting it. plz check:

!*******************************************************!
! This is the recursion subroutine !
!
!*******************************************************!
module mhop
CONTAINS

subroutine hop (il,e,ienrp,map,srl,ap1,ap6,ap7, &
ap8,ap9,ap10,ap11,ap12,ap13, &
p2,p3,p4,xa,ya)
!*******************************************************!
! The subroutine is the recursion subroutine !
! It uses the potential parameters generated !
! by *POTPAR* subroutine. !
! this module interface is not actually required. !
! it is done only to make sure that POTPAR by !
! this routine and that this routine is called !
! by main texttt{properly}.par
! Here We have done the 2nd order calculation : !
! $$E=H+hoh$$
!*******************************************************!
use kinds, only: RDP,i3
use parameters
implicit none
integer(i3):: i,il,ikl,ifind,ii,j
integer(i3):: nfill,jfill,isfill,nrec
integer(i3):: kc,k,ienrp
integer(i3):: mfill,ikk
integer(i3),dimension(nasite,ntsite),intent(in)::map
real(RDP):: s1,sum1,sum2,sum3,e
real(RDP),dimension(lorbit,lorbit,0:nrsite),intent(in):: srl
real(RDP),dimension(lorbit),intent(in)::ap1,ap6,ap7,ap8,ap9, &
ap10,ap11,ap12,ap13
real(RDP),dimension(nsite,lorbit)::psii,psij,psik,psim
real(RDP),dimension(maxrec)::alpha
real(RDP),dimension(0:maxrec)::beta
real(RDP),dimension(lorbit)::cons,ar
real(RDP),dimension(lorbit):: p2,p3,p4
real(RDP),dimension(maxrec,ienum+1):: ya,xa

sum1=0.0d0
beta(0)=0.0d0
ifind=1

psii(:,:)=0.0d0
psij(:,:)=0.0d0
psik(:,:)=0.0d0
psim(:,:)=0.0d0

psij(1,il)=1.0d0
nfill=1

open(15,file='coeff.dat',status='unknown')

! Starting Recursion loop(12 iteration)
lrec: do nrec=1,maxrec
jfill=nfill

lfill: do i=1,nfill

kc=map(i,ntsite)

!+++++++++++++++++++++++++++++++++++++++++++++++!
! HOP !
! For 2st order. hoh is included !
!+++++++++++++++++++++++++++++++++++++++++++++++!

if1: if (kc==1) then
do ikl=1,lorbit
s1=(ap6(ikl)*psij(i,ikl))
psim(i,ikl)=psim(i,ikl)+s1
end do

do ii=1,lorbit
cons(ii)=psij(i,ii)
end do

do j=1,nrsite
k=map(i,j)
if(k>nsite) cycle
if(k>ifind) ifind=k

call matp(srl,cons,j,ar)

do ii=1,lorbit
psim(k,ii)=psim(k,ii)+ar(ii)
enddo
end do
cycle lfill
endif if1

!-----------------------------------------------!
! If real space part is = 1 then !
!-----------------------------------------------!

do ikl=1,lorbit
s1=ap7(ikl)*psij(i,ikl)
psim(i,ikl)=psim(i,ikl)+s1
end do

do ii=1,lorbit
cons(ii)=psij(i,ii)
end do

do j=1,nrsite
k=map(i,j)
if(k>nsite) cycle
if(k>ifind) ifind=k
call matp(srl,cons,j,ar)
do ii=1,lorbit
psim(k,ii)=psim(k,ii)+ar(ii)
enddo
end do

k=map(i,ntsite-2)
if (k>nsite) cycle lfill
if (k>ifind)ifind=k

do ikl=1,lorbit
s1=0.0d0
s1=ap8(ikl)*psij(i,ikl)
psim(k,ikl)=psim(k,ikl)+s1
end do

k=map(i,ntsite-1)
if(k==0)cycle lfill

do ikl=1,lorbit
s1=0.0d0
s1=ap9(ikl)*psij(i,ikl)
psim(i,ikl)=psim(i,ikl)+s1
end do
end do lfill
!---------H operation finished--------!

!---------O-operation Starts----------!

nfill=ifind

if(nfill>nasite) nfill=nasite

lnfill: do i=1,nfill

kc=map(i,ntsite)

if(kc==1)then

do ikl=1,lorbit
s1=0.0d0
s1=ap10(ikl)*psim(i,ikl)
psik(i,ikl)=psik(i,ikl)+s1
enddo
cycle lnfill
endif
!-------------------------------------------------
do ikl=1,lorbit
s1=ap11(ikl)*psim(i,ikl)
psik(i,ikl)=psik(i,ikl)+s1
enddo

k=map(i,ntsite-2)
if(k.gt.nsite) cycle lnfill
if(k.gt.ifind)ifind=k

do ikl=1,lorbit
s1=0.0d0
s1=ap12(ikl)*psim(i,ikl)
psik(k,ikl)=psik(k,ikl)+s1
enddo

k=map(i,ntsite-1)

if(k==0)cycle lnfill
do ikl=1,lorbit
s1=0.0d0

s1=ap13(ikl)*psim(i,ikl)
psik(i,ikl)=psik(i,ikl)+s1

enddo

enddo lnfill
!------O-operation complete------------!
psim=0.0d0

!******* again h-operation *************
isfill=ifind

if(isfill>nasite)isfill=nasite
lisfill:do i=1,isfill

kc=map(i,ntsite)

lkc: if(kc==1)then

do ikl=1,lorbit
s1=0.0d0
s1=ap6(ikl)*psik(i,ikl)
psim(i,ikl)=psim(i,ikl)+s1
enddo

do ii=1,lorbit
cons(ii)=psik(i,ii)
enddo

do j=1,nrsite
k=map(i,j)
if(k>nsite)cycle
if(k>ifind)ifind=k

call matp(srl,cons,j,ar)

do ii=1,lorbit
psim(k,ii)=psim(k,ii)+ar(ii)
enddo

enddo

cycle lisfill
endif lkc

!********************************************

do ikl=1,lorbit
s1=0.0d0
s1=ap7(ikl)*psik(i,ikl)
psim(i,ikl)=psim(i,ikl)+s1
enddo

do ii=1,lorbit
cons(ii)=psik(i,ii)
enddo

do j=1,nrsite

k=map(i,j)

if(k>nsite)cycle

if(k>ifind)ifind=k

call matp(srl,cons,j,ar)

do ii=1,lorbit
psim(k,ii)=psim(k,ii)+ar(ii)
enddo

enddo

k=map(i,ntsite-2)
if(k.gt.nsite)cycle lisfill
if(k.gt.ifind)ifind=k

do ikl=1,lorbit
s1=0.0d0
s1=ap8(ikl)*psik(i,ikl)
psim(k,ikl)=psim(k,ikl)+s1
enddo

k=map(i,ntsite-1)

if(k==0)cycle lisfill
do ikl=1,lorbit
s1=0.0d0
s1=ap9(ikl)*psik(i,ikl)
psim(i,ikl)=psim(i,ikl)+s1

enddo

enddo lisfill
!*********** hoh complete*************
psik=0.0d0
!******** 2nd h-operatipn complete************** hoh complete

ljfill: do i=1,jfill
kc=map(i,ntsite)
lifn: if (kc==1) then
do ikl=1,lorbit
s1=ap1(ikl)*psij(i,ikl)
psik(i,ikl)=psik(i,ikl)+s1
end do

do ii=1,lorbit
cons(ii)=psij(i,ii)
end do

nrs: do j=1,nrsite
k=map(i,j)
if (k>nsite) cycle
if(k>ifind) ifind=k

call matp(srl,cons,j,ar)

do ii=1,lorbit
psik(k,ii)=psik(k,ii)+ar(ii)
end do

end do nrs
cycle ljfill
endif lifn

do ikl=1,lorbit
s1=p2(ikl)*psij(i,ikl)
psik(i,ikl)=psik(i,ikl)+s1
end do
do ii=1,lorbit
cons(ii)=psij(i,ii)
end do

do j=1,nrsite
k=map(i,j)
if (k>nsite) cycle
if (k>ifind)ifind=k

call matp(srl,cons,j,ar)

do ii=1,lorbit
psik(k,ii)=psik(k,ii)+ar(ii)
end do

end do

k=map(i,ntsite-2)
if (k>nsite) cycle
if (k>ifind) ifind=k

do ikl=1,lorbit
s1=p4(ikl)*psij(i,ikl)
psik(k,ikl)=psik(k,ikl)+s1
end do

k=map(i,ntsite-1)
if (k==0) cycle
do ikl=1,lorbit
s1=p3(ikl)*psij(i,ikl)
psik(i,ikl)=psik(i,ikl)+s1
end do
end do ljfill

do ikk=1,ifind
do ikl=1,lorbit
psik(ikk,ikl)=psik(ikk,ikl)-psim(ikk,ikl)
end do
end do

!===============================================!
! calculating $alpha$ & $beta$ !
! may be done as a subroutine !
!===============================================!

mfill=ifind
sum1=0.0d0
do i=1,jfill
do j=1,lorbit
sum1=sum1+psij(i,j)*psij(i,j)
end do
end do

sum2=0.0d0
do i=1,mfill
do j=1,lorbit
sum2=sum2+psik(i,j)*psij(i,j)
end do
end do

alpha(nrec)=sum2/sum1

do i=1,mfill
do j=1,lorbit
psik(i,j)=psik(i,j)-alpha(nrec)*psij(i,j)-beta(nrec-1)*psii(i,j)
end do
end do

sum3=0.0d0
do i=1,mfill
do j=1,lorbit
sum3=sum3+psik(i,j)*psik(i,j)
end do
end do

beta(nrec)=sum3/sum1

do i=1,nsite
do j=1,lorbit
psii(i,j)=psij(i,j)
psij(i,j)=psik(i,j)
psik(i,j)=0.0d0
psim(i,j)=0.0d0
end do
end do

nfill=mfill
if (mfill>nasite)nfill=nasite
ifind=1
xa(nrec,ienrp)=alpha(nrec)
ya(nrec,ienrp)=beta(nrec)
write(15,*)nrec,alpha(nrec),beta(nrec)
end do lrec
close(15)

end subroutine hop

!-----------------------------------------------!
! Subroutine MATP !
! to multiply SRL*CONS !
subroutine matp(srl,cons,j,ar)
!-----------------------------------------------!
use kinds
use parameters
implicit double precision(a-h,o-z)

real(RDP),dimension(lorbit,lorbit,0:nrsite):: srl
real(RDP),dimension(lorbit):: cons,ar

do i=1,lorbit
su=0.0d0
do kl=1,lorbit
su=su+srl(i,kl,j)*cons(kl)
enddo
ar(i)=su
enddo

end subroutine

end module mhop

hope this will help. looking for help very eagerly.

jimdempseyatthecove · ‎12-06-2008

Roddur,

Begin with changing

[cpp]real(RDP),dimension(nsite,lorbit)::psii,psij,psik,psim
real(RDP),dimension(maxrec)::alpha
real(RDP),dimension(0:maxrec)::beta
real(RDP),dimension(lorbit)::cons,ar
real(RDP),dimension(lorbit):: p2,p3,p4
real(RDP),dimension(maxrec,ienum+1):: ya,xa

to

real(RDP), automatic ,dimension(nsite,lorbit)::psii,psij,psik,psim
real(RDP), automatic, dimension(maxrec)::alpha
real(RDP), automatic, dimension(0:maxrec)::beta
real(RDP), automatic, dimension(lorbit)::cons,ar
real(RDP), automatic, dimension(lorbit):: p2,p3,p4
real(RDP), automatic, dimension(maxrec,ienum+1):: ya,xa

[/cpp]

Without the automatic these arrays may have been "SAVE" (and shared amoungst your threads).

Jim Dempsey

roddur · ‎12-06-2008

Quoting - jimdempseyatthecove

Without the automatic these arrays may have been "SAVE" (and shared amoungst your threads).
Jim Dempsey

hello Jim,

i have changed the arrays to automatic but didnt get much. can you plz suggest me which should be private and shared in this hop subroutine? this is the main routine and once it is done, i may run fit and spectral in ordered way....that will not consume much time.

hope your help

jimdempseyatthecove · ‎12-07-2008

Here is a suggestion to help track down the problem.

One of the major reasons for thread conflicts is when multiple threads assume they own a memory resource when it is shared (not by design). Although some tools (e.g. Thread Checker) can aid in finding these problems, you may have to resort to finding these yourself.

In the case where you may have a matrix that is to be constructed (written) and you wish to use multiple threads to do so, you typically decompose the matrix into tiles and then have each thread compute and fill in a tile. Problems arrise when threads write to tiles (or other shared data) that are not within the designated areas for them to write. This conition not only includes writing to an output tile not owned, but also using shared variables under the assumption they are exclusively owned (the missing automatic covered the last situation).

To isolate the tile ownership problem you can add diagnostic code. I prefer to use the Fortran Preprocessor as it makes for clear reading. The route I would use is

For the arrays and variables passed into the questionably threaded subroutine is to create an ownership array of equal rank and extent. Then outside the parallel region, initialize all ownership array cells to -1 (not claimed). The subroutine under test would then obtain its thread number, and test the cell positions in the ownership array corrisponding with thecells it will write in the output arrays. If owner ship is -1, then write inthread number else if ownership != thread number report error.

Jim

roddur · ‎12-10-2008

hello jimdempseyatthecove,

can you plz be a bit more clear? i am really stuck here

TimP · ‎12-10-2008

Quoting - roddur

hello jimdempseyatthecove,

can you plz be a bit more clear? i am really stuck here

You'll have to do some work yourself. If your code is as difficult for you to read as for me, at least put in some formatting (indentation etc.). You haven't been clear what problems you have, where do you have complaints from thread checker, etc.