Re: omp thread

roddur · ‎01-11-2009

with the risk of getting scolded by steve,tim and jim, i am putting 1 question here.
This is a part of my code and as you people can remember its bothering me for preety long time!

!$OMP PARALLEL DO DEFAULT(NONE) &
!$OMP PRIVATE(il,ie,iie,ienrp,iend) &
!$OMP PRIVATE(temp,spec,xa,ya) &
!$OMP SHARED(ap1,ap2,ap3,ap4,ap6,ap7) &
!$OMP SHARED(ap8,ap9,ap10,ap11,ap12,ap13) &
!$OMP SHARED(MAP,SRL,seed,de,xxa,yya,e,istart)
do iend=9,63,lorbit
e=e+de
iie=0

ienrp=ienrp+1
call hop(il,e,ienrp,map,srl,ap1, &
ap6,ap7,ap8,ap9,ap10,ap11, &
ap12,ap13,ap2,ap3,ap4,xa,ya)

istart=iend+1
write(*,'(".",$)')
end do
!$OMP END PARALLEL DO

the subroutine HOP has many other variable other then the arguments. should i declare them as private/shared as well? if yes, where should i declare it? in the main code or in the HOP itself?

plz reply.

TimP · ‎01-12-2009

Variables declared locally in HOP will be inherently private, and don't need any changes for OpenMP.
COMMON blocks, where the subroutine modifiies values, would need threadprivate, and possibly copyin directives.

jimdempseyatthecove · ‎01-12-2009

Roddur,

We are not trying to scold you, we are trying to help you. Forum message repliestend to be short and to the point.

In the places of your code where you modify a shared variable use !$OMP ATOMIC. Example:

!$OMP ATOMIC
e=e+de

Note, an atomic operation drills through to RAM, and generally affects the cache line holding e in the other cores. The atomic addition will take on the order of 100x to 300x the length of time it takes to add to a variable held in L1 cache. Therefore, it is best to perform as few of atomic operations in your loop as possible. When a significant portion of your loop (including code executed by hop in this case) is atomic, then consider rewriting the code such that each thread works on more seperated data.

Also consider that a second thread may issue the e=e+de immediately following the first thread issuing the e=e+de but before the first thread issues the call to hop. Depending on how e was declared, the first thread's value of e entering hop might be that of the second thread's value for e. Note, typically e is passed through hop by reference.

The PRIVATE variables (without COPYIN or other clause) are initialized to 0. Therefor a statement such as

ienrp=ienrp+1

cycles (1,2,3,...) for each thread. This may or may not be what you intend.

Jim Dempsey

roddur · ‎01-12-2009

hello,
thanks for you reply. is it possible for you to go through the code if i send you the routines? specially the hop.f90 and the main.f90 that calls the hop?
i guess in that case you can help me better. but this are ~500 line each.
if it is possible for you, i ll be greatful

jimdempseyatthecove · ‎01-12-2009

Yes, I would be willing to take a cursory look at the code and make recommendations to point you in the right direction. However, to produce a fully tested and debugged version of your code may require a considerable amount of work (and may require access to your complete application, or a test suite for your critiical functions). If you are unable to make good progress with therecomendations, then you should consider hiring a contract programmer to do the work for you. Once you see the programming technique for this first application, you may have learned enough and feel confident enough to make the parallization code changes to your other applicaitons. There are several forum members here that may be willing to perform the contract work for you (including myself).

From the sample code you submitted I am guessing you are performing something simalar to planetary modeling. I have experiece in this area by integrating the NAIF SPICE Toolkit (JPL planetary mission database) into a simulator. The simulator was written in FORTRAN and I madeextensive modifications for purposes of studying Space Elevator designs (I hold a U.S. Patent on an active tether design). The original simulation code was single threaded and was not capable of including multiple planetary bodies. It is now multi-threaded and can supplort planetary models to any desired fidelity (up to all planets, sun, all moons, allcalculated asteroids, comets, andartificial satelites).

Jim Dempsey

roddur · ‎01-12-2009

hmmm...unfortunetly i dnt hv enough to hire a programmer...hence i depend on your goodwill. I am attaching the codes. plz suggest .
i have created a folder called rudra and put main.f90 and hop.f90 there. the parallel region contains 2 other subroutine, but they are unimportent.(one thing i am not sure is where the folder is saved)
looking for your reply.

jimdempseyatthecove · ‎01-13-2009

I did not see your files. When you are on the Reply To Post web page, above the tool bar is Add Files button. I do not think you can add a folder. Can you .ZIP the folder and add the .zip archive.

Jim Dempsey

roddur · ‎01-13-2009

i cannot upload the files, so i am just posting them.
though it become too congested, i think that ll help.
This is the main program!

!***********************************************!
! DRIVER ROUTINE FOR THE ASR CODE !
! Rudra Banerjee !
! 9.09.08 !
!***********************************************!
program main

use kinds, only: RDP,i3
use parameters !Declaring the parameters
use mhop !Recursion
use mmat !Structure matrix generator
use newfit !B_P Feeting
use mdos !DOS
use mband !Energy band
use mfermi !Fermi lebel
use mldos !LDOS
use omp_lib

implicit none

integer(i3):: i,j,k,il,l,ikki,ok
integer(i3)::ios,nit=0,nsp,its
integer(i3):: ienrp,istart,iend,ie,iie
integer(i3)::ipq,nconv
integer:: osd1,osd2 !spllting of d-orbitals
integer(i3),parameter :: maxclock = 10
integer,dimension(nasite,ntsite)::map
real*8::t1,t2
real(RDP):: sum1,sum2,sum3,sum4,sum5,sum6,sum7, &
sum8,sum9,sum10,sum11,sum12,sum13
real(RDP)::dsum
real(RDP):: qtot_a,qtot_b,qtot_a1,qtot_b1
real(RDP)::q,de,e,ef,e_band
real(RDP):: xcha,xchb,qold_a,qold_b
real(RDP),dimension(maxrec,ienum+1)::xa,ya
real(RDP),dimension(maxrec+1,nfn+1):: xxa,yya
real(RDP),dimension(ienum+1)::seed
real(RDP),dimension(nfn+1):: temp
real(RDP),dimension(lorbit,lorbit)::ap5
real(RDP),dimension(lorbit)::ap1,ap6,ap7,ap8, &
ap9,ap10,ap11, &
ap12,ap13,p2,p3,p4
real(RDP),dimension(npn)::ap2,ap3,ap4
real(RDP),dimension(2)::es, cs, ds, gs, ep, cp, &
xx, dp, gp, ed, cd, dd, gd
real(RDP),dimension(2)::dels, delp, deld, rs, &
rp, rd
real(RDP),dimension(2)::dsqs, dsqp, dsqd, ccs, &
ccp, ccd, os, op, od
real(RDP),dimension(ntype,lorbit):: agc, agd, enu_ag, ag_op
real(RDP),dimension(lorbit)::cpot, enuc, dpot_sq, opc
real(RDP),dimension(lorbit,lorbit,0:nrsite)::srl
real(RDP),dimension(nkp,nfn+1)::spec
real(RDP),dimension(2,nfn+1,2)::sorb,porb,dorb
real(RDP),dimension(nfn+1)::dos,dos_a,dos_b
real(RDP),dimension(nfn+1)::dos_adn,dos_bdn,T_dos,dos_aa
real(RDP),dimension(3)::a
real(RDP),dimension(2,nfn+1,2)::dos1
real(RDP),dimension(0:8,2)::A_MOM,B_MOM
real(RDP),dimension(3,2)::ECG_A,ECG_B
integer,dimension(8) :: values1,values2
character(10)::potin,potout
character(2)::sps
character(10)::actn
character,parameter :: esc = char(27)
logical::lstop
! OMP SPL
integer::tid,nthrd

write(*,*) "Concentration of Atom A =",x
write(*,*) "Concentration of Atom B =",y
call date_and_time(VALUES=values1)
call cpu_time(t1)

lat: selectcase(lattice)
case(225) !fcc
osd1=3;osd2=2
case(221) !bcc
osd1=3;osd2=2
case(216) !tetrahedron
osd1=3;osd2=2
case default !error
write(*,*) "ERROR!!TERMINATING RUN!!!"
write(*,*) "You have not selected allowed space group"
write(*,*) "Plz. select among 216/221/225"
stop
end select lat

lscf: do nit=1,nscf

write (*,fmt='(3a)',advance='no') ' ', ESC, '[1m' ! set bold
write(*,'("Running ASR loop",1x,i3)',advance='no') nit
write (*, '(3a)') ' ', ESC, '[0m' ! restores display defaults.

! Reading Augmented space map `ASMAP'
open(11,file=ASMAP,status='old',IOSTAT=ios,form='unformatted')
do i=1,nasite
read(11)(map(i,j),j=1,17)
end do
write(*,*) "Reading ASMAP Complete"
close(11)

lspin: do nsp=1,spn

if (nsp==1)then
sps='UP'
else
sps='DW'
endif

write(*,'("WORKING FOR SPIN",1x,a2)') sps

! For the present purpose, we are dealling with binary alloy. Hence we
! have two types of atom. We take potential files containning es,~cs,~ds,~gs
! ep,~cp,~dp,~gp, ed,~cd,~dd,~gd as
! output of TB-LMTO calculation for both of these atom.

do i=1,ntype
potin='POTPAR_'//ACHAR(i+16+48)
potout='POT_VAR_'//ACHAR(i+16+48)

open(i,file=potin,status='old',iostat=ios)
if(ios/=0) then
write(*,*) "error in openning file",potin
stop
endif

write(*,*) "Reading POTENTIAL PARAMETERS from ",potin
read (i,*) es(i),cs(i),ds(i),xx(i),gs(i),xx(i) !
read (i,*) ep(i),cp(i),dp(i),xx(i),gp(i),xx(i) ! Reading up spins
read (i,*) ed(i),cd(i),dd(i),xx(i),gd(i),xx(i) !/

if(nsp==2)then
read (i,*) es(i),cs(i),ds(i),xx(i),gs(i),xx(i) !
read (i,*) ep(i),cp(i),dp(i),xx(i),gp(i),xx(i) ! Reading dn spins
read (i,*) ed(i),cd(i),dd(i),xx(i),gd(i),xx(i) !/
endif
close(i)

open(i+10,file=potout,status='unknown',position='append')
write(i+10,*) es(i),cs(i),ds(i),xx(i),gs(i),xx(i)
write(i+10,*) ep(i),cp(i),dp(i),xx(i),gp(i),xx(i)
write(i+10,*) ed(i),cd(i),dd(i),xx(i),gd(i),xx(i)
close(i+10)
end do

! Reading POTPAR files from LMTO done

! gamma -->alpha transformation
! Given, TB-LMTO potential paramaters are written in $Gamma$
! representation, we transform it to $alpha$ form.
do i=1,ntype

ds(i)=dabs(ds(i))
dp(i)=dabs(dp(i))
dd(i)=dabs(dd(i))

dels(i)=ds(i)*ds(i)
delp(i)=dp(i)*dp(i)
deld(i)=dd(i)*dd(i)

rs(i)=1.0d0-(gs(i)-alps)*((cs(i)-es(i))/dels(i))
rp(i)=1.0d0-(gp(i)-alpp)*((cp(i)-ep(i))/delp(i))
rd(i)=1.0d0-(gd(i)-alpd)*((cd(i)-ed(i))/deld(i))

dsqs(i)=rs(i)*dsqrt(dels(i))
dsqp(i)=rp(i)*dsqrt(delp(i))
dsqd(i)=rd(i)*dsqrt(deld(i))

ccs(i)=rs(i)*(cs(i)-es(i))+es(i)
ccp(i)=rp(i)*(cp(i)-ep(i))+ep(i)
ccd(i)=rd(i)*(cd(i)-ed(i))+ed(i)

os(i)=(alps-gs(i))/(dels(i)*rs(i))
op(i)=(alpp-gp(i))/(delp(i)*rp(i))
od(i)=(alpd-gd(i))/(deld(i)*rd(i))

end do
! POTPAR for different orbitals
! Now, we assign texttt{potpar} to different orbitals (s,p,d)

do i=1,ntype

agc(i,1)=ccs(i) !
agd(i,1)=dsqs(i) !
enu_ag(i,1)=es(i) ! POTPAR for S orbital
ag_op(i,1)=os(i) ! /
!/

agc(i,2:4)= ccp(i) !
agd(i,2:4)= dsqp(i) !
enu_ag(i,2:4)=ep(i) ! POTPAR for P orbital
ag_op(i,2:4)= op(i) ! /
!/

agc(i,5:9)= ccd(i) !
agd(i,5:9)= dsqd(i) !
enu_ag(i,5:9)=ed(i) ! POTPAR for D orbital
ag_op(i,5:9)= od(i) ! /
!/

end do

ltype: do ipq=1,ntype
write(*,'("WORKING FOR ATOM",1x,i1)') ipq
open(12,file=STRUCTURE,status='old',IOSTAT=ios)

do i=1,nrsite+1
do j=1,lorbit
read(12,*)(srl(j,k,i-1),k=1,lorbit)
end do
end do
close(12)
write(*,*) "Reading STRUCTURE MATRIX complete"

do l=1,lorbit
cpot(l)=agc(ipq,l)
enuc(l)=enu_ag(ipq,l)
dpot_sq(l)=agd(ipq,l)**2
opc(l)=ag_op(ipq,l)
enddo

! GENERATING POTENTIAL PARAMETERS TO BE USED IN RECURSION CODE

! AP1
lap1: do l=1,lorbit
ap1(l)=cpot(l)+dpot_sq(l)*srl(l,l,0)
end do lap1
! write(*,*) "AP01 Generated"
!= = = = = = = = = = = = = = = = = = = = = =
! AP2
de=(emax-emin)/dfloat(ienum)
e=emin
k=0
lap2: do i=1,ienum+1
dsum=0.0d0
do l=1,lorbit
sum1=0.0d0;sum2=0.0d0
k=k+1
sum1= (agc(1,l)/(agd(1,l)*agd(1,l)))*x + &
(agc(2,l)/(agd(2,l)*agd(2,l)))*y
sum1=(sum1+srl(l,l,0))*dpot_sq(l)
sum2=x/(agd(1,l)*agd(1,l))+y/(agd(2,l)*agd(2,l))
sum2=sum2*dpot_sq(l)
sum2=(sum2-1.0d0)*e
dsum=sum1-sum2
ap2(k)=dsum
enddo
seed(i)=e
e=e+de
end do lap2
! write(*,*) "AP02 Generated"
!= = = = = = = = = = = = = = = = = = = = = =
!AP3
de=(emax-emin)/dfloat(ienum)
e=emin
k=0
lap3: do i=1,ienum+1
dsum=0.0d0

do l=1,lorbit
sum3=0.0d0
k=k+1

sum3=(e-agc(1,l))/(agd(1,l)*agd(1,l))
sum3=sum3-(e-agc(2,l))/(agd(2,l)*agd(2,l))
sum3=sum3*(y-x)*dpot_sq(l)

ap3(k)=-sum3
end do
seed(i)=e
e=e+de
end do lap3
! write(*,*) "AP03 Generated"
!= = = = = = = = = = = = = = = = = = = = = =
!AP4
e=emin
k=0
lap4: do i=1,ienum+1
dsum=0.0d0

do l=1,lorbit
sum4=0.0d0
k=k+1

sum4=(e-agc(1,l))/(agd(1,l)*agd(1,l))
sum4=sum4-(e-agc(2,l))/(agd(2,l)*agd(2,l))
sum4=sum4*dsqrt(x*y)*dpot_sq(l)
ap4(k)=-sum4
end do
e=e+de
end do lap4
! write(*,*) "AP04 Generated"
!= = = = = = = = = = = = = = = = = = = = = =
!AP5
ap5=0.0d0

lap5: do i=1,lorbit
ap5(i,i)=dsqrt(dpot_sq(i))
end do lap5
! write(*,*) "AP05 Generated"
!= = = = = = = = = = = = = = = = = = = = = =
do i=1,nrsite
call matmult(ap5,i,srl)
end do
! We generate new structure matrix from the subroutine texttt{matmult} to be
! use for the texttt{POTPAR} 7-13.par
!=============================================
!AP6
lap6: do l=1,lorbit
ap6(l)=cpot(l)+dpot_sq(l)*srl(l,l,0)-enuc(l)
end do lap6
! write(*,*) "AP06 Generated"
!= = = = = = = = = = = = = = = = = = = = = = =
!AP7
lap7: do l=1,lorbit
sum7=0.0d0
sum7= ((agc(1,l)-enu_ag(1,l))/(agd(1,l)*agd(1,l)))*x + &
((agc(2,l)-enu_ag(2,l))/(agd(2,l)*agd(2,l)))*y
sum7=(sum7+srl(l,l,0))*dpot_sq(l)

ap7(l)=sum7
enddo lap7
! write(*,*) "AP07 Generated"
!= = = = = = = = = = = = = = = = = = = = = = =
!AP9
lap9: do l=1,lorbit
sum9=0.0d0
sum9=(agc(1,l)-enu_ag(1,l))/(agd(1,l)*agd(1,l))
sum9=sum9-(agc(2,l)-enu_ag(2,l))/(agd(2,l)*agd(2,l))
sum9=sum9*(y-x)*dpot_sq(l)
ap9(l)=sum9
end do lap9
! write(*,*) "AP09 Generated"
!= = = = = = = = = = = = = = = = = = = = = = =
!AP8
lap8: do l=1,lorbit
sum8=0.0d0

sum8=(agc(1,l)-enu_ag(1,l))/(agd(1,l)*agd(1,l))
sum8=sum8-(agc(2,l)-enu_ag(2,l))/(agd(2,l)*agd(2,l))
sum8=sum8*dsqrt(x*y)*dpot_sq(l)

ap8(l)=sum8
enddo lap8
! write(*,*) "AP08 Generated"
!= = = = = = = = = = = = = = = = = = = = = = =
!AP10
lap10: do l=1,lorbit
ap10(l)=opc(l)
enddo lap10
! write(*,*) "AP10 Generated"
!= = = = = = = = = = = = = = = = = = = = = = =
!AP11
lap11: do l=1,lorbit
ap11(l)=x*(agd(1,l)*ag_op(1,l)*agd(1,l))+ &
y*(agd(2,l)*ag_op(2,l)*agd(2,l))
ap11(l)=ap11(l)*(1.0d0/dpot_sq(l))
enddo lap11
! write(*,*) "AP11 Generated"
!= = = = = = = = = = = = = = = = = = = = = = =
!AP13
lap13: do l=1,lorbit

sum13=0.0d0
sum13=agd(1,l)*ag_op(1,l)*agd(1,l)- &
agd(2,l)*ag_op(2,l)*agd(2,l)
sum13=sum13*(y-x)*(1.0d0/dpot_sq(l))

ap13(l)=sum13
end do lap13
! write(*,*) "AP13 Generated"
!= = = = = = = = = = = = = = = = = = = = = = =
!AP12
lap12: do l=1,lorbit
ap12(l)=agd(1,l)*ag_op(1,l)*agd(1,l)- &
agd(2,l)*ag_op(2,l)*agd(2,l)
ap12(l)=ap12(l)*dsqrt(x*y)*(1.0d0/dpot_sq(l))
enddo lap12
! write(*,*) "AP12 Generated"
!= = = = = = = = = = = = = = = = = = = = = = =

! Once the texttt{POTPAR}'s are created, now we are ready to call the
! recursion subroutine texttt{hop}. I have calculated hop for s,p and 2 d
! orbitals(t2 & e2).

!======================================================!
! this chunk should be parallel
! Energy dependent loop starts
! XA,YA,XXA,YYA,TEMP & SPEC
!======================================================!

!$ call OMP_SET_NUM_THREADS(2)

!$OMP PARALLEL DO DEFAULT(NONE) &
!$OMP PRIVATE(il,ie,iie,ienrp,iend) &
!$OMP PRIVATE(temp,spec,xa,ya) &
!$OMP SHARED(ap1,ap2,ap3,ap4,ap6,ap7) &
!$OMP SHARED(ap8,ap9,ap10,ap11,ap12,ap13) &
!$OMP SHARED(MAP,SRL,seed,de,xxa,yya,e,istart)
orbital: do il=1,lorbit-2,2
xa=0.0;ya=0.0
write(*,'(1x,"Starting orbital loop",1x,i1,$)') il
ienrp=0
e=emin-de
istart=1

do iend=9,63,lorbit
!$OMP ATOMIC
e=e+de
iie=0

!$OMP ATOMIC
ienrp=ienrp+1
call hop(il,e,ienrp,map,srl,ap1, &
ap6,ap7,ap8,ap9,ap10,ap11, &
ap12,ap13,ap2,ap3,ap4,xa,ya)

write(*,*) loc(xa(1,1))
istart=iend+1
write(*,'(".",$)')
end do
call fit(xa,ya,seed,xxa,yya,temp)
call spectral(il,xxa,yya,temp,spec)
write(*,'("done")')

end do orbital
!$OMP END PARALLEL DO
!======================================================!
! Energy dependent loop done
! parallel thread shld end here
! inquire(file='stop',EXIST=lstop)
! if (lstop) then
! open(190,file='stop',status='old')
! read(190,*)actn
! if(trim('immideate').eq.trim(actn))then
! write(*,*) "HARD STOP ENCOUNTERED &
! & STOPPING SYSTEM IMMEDEATLY"
! call system('rm -f stop')
! t0cpu(2)=scnds()
! t0wall(2) = cclock()
! write(*,'(1x,"ELAPSED WALL CLOCK TIME =",1x,f10.4,1x,"s")') &
! t0wall(2)-t0wall(1)
! write(*,'(1x,"ELAPSED CPU CLOCK TIME =",1x,f10.4,1x,"s")') &
! t0cpu(2)-t0cpu(1)
! stop
!
! endif
! if((trim('loop').eq.trim(actn)).and.(il==7)) &
! write(*,*) "LOOP STOP ENCOUNTERED. &
! & RUN WILL STOP AFTER THIS LSCF LOOP"
! close(190)
! endif

!======================================================!

call tdos(spec,ipq,nsp,dos, &
sorb,porb,dorb,osd1,osd2)

!***********************************************!
! Writing DOS to the files
!
!***********************************************!
! DOS of atom A
if(ipq==1.and.nsp==1)then !UP spin for A
open(1,file='DOS_A_UP',status='replace')
do i=1,nfn+1
write(1,*)temp(i),dos(i)
dos_aa(i)=dos(i)
enddo
close(1)
endif
if(ipq==1.and.nsp==2)then !DOWN spin for A
open(1,file='DOS_A_DN',status='replace')
do i=1,nfn+1
write(1,*)temp(i),dos(i)
dos_adn(i)=dos(i)
enddo
close(1)
endif

! DOS of atom B
if(ipq==2.and.nsp==1)then !UP spin for B
open(1,file='DOS_B_UP',status='replace')
do i=1,nfn+1
write(1,*)temp(i),dos(i)
dos_b(i)=dos(i)
enddo
close(1)
endif

if(ipq==2.and.nsp==2)then !DOWN spin for B
open(1,file='DOS_B_DN',status='replace')
do i=1,nfn+1
write(1,*)temp(i),dos(i)
dos_bdn(i)=dos(i)
enddo
close(1)
endif

end do ltype

end do lspin

call ldos(dos_aa,dos_b,dos_adn,dos_bdn,temp,T_dos)

call fermi(temp,T_dos,ef)
call band(temp,T_dos,ef,e_band)

call pardos(temp,sorb,porb,dorb,ef,nit,e_band, &
qtot_a,qtot_b,qtot_a1,qtot_b1)
open(1,file='/home/rudra/Recursion/LMTO_A/ATOM/CHECK',status='new')
! write(1,*)ipdos
close(1,status='delete')

open(2,file='/home/rudra/Recursion/LMTO_B/ATOM/CHECK',status='new')
! write(2,*)ipdos
close(2,status='delete')

if(nit/=0)then
xcha=dabs(qtot_a-qold_a)
xchb=dabs(qtot_b-qold_b)
if((xcha<0.0001d0).and.(xchb<0.0001d0))then
write(*,*) 'CHARGE CONVERGENCE DONE'
nconv=nconv+1
if(nconv==5)stop "SCF DONE"
else
nconv=0
endif
endif

qold_a=qtot_a
qold_b=qtot_b

write(*,*) qtot_a,qtot_b
write(*,*) nit
if (nit==nscf)go to 3000

11 open(1,file='/home/rudra/Recursion/LMTO_B/ATOM/CHECK_A',status='old',iostat=ok)
do ikki=1,150
end do
! if(ok/=0) goto 11
close(1,status='delete')

12 open(1,file='/home/rudra/Recursion/LMTO_B/ATOM/CHECK_B',status='old',iostat=ok)
do ikki=1,150
end do
! if(ok/=0) go to 12
close(1,status='delete')

inquire(file='stop',EXIST=lstop)
if (lstop) then
open(190,file='stop',status='old')
read(190,*)actn
if(trim('loop').eq.trim(actn))then
call system('rm -f stop')
! go to 555
endif
close(190)
endif

end do lscf
3000 continue

call cpu_time(t2)
call date_and_time(VALUES=values2)
write(*,100) values2(3)-values1(3), values2(5)-values1(5),values2(6)-values1(6),values2(7)-values1(7)
100 format(1x,"ELAPSED WALL CLOCK TIME =",1x,i2,1x,"days",2x,i2,1x,"hr",2x,i2,1x,"min",2x,i2,1x,"sec")

!555 t0cpu(2)=scnds()
! t0wall(2) = cclock()
! write(*,'(1x,"ELAPSED WALL CLOCK TIME =",1x,f10.4,1x,"s")') &
! t0wall(2)-t0wall(1)
write(*,'(1x,"ELAPSED CPU CLOCK TIME =",1x,f10.4,1x,"s")') &
t2-t1

end program main

The code hop

!*******************************************************!
! This is the recursion subroutine !
!
!*******************************************************!
module mhop
CONTAINS
subroutine hop (il,e,ienrp,map,srl,ap1,ap6,ap7, &
ap8,ap9,ap10,ap11,ap12,ap13, &
p2,p3,p4,xa,ya)
!*******************************************************!
! The subroutine is the recursion subroutine !
! It uses the potential parameters generated !
! by *POTPAR* subroutine. !
! this module interface is not actually required. !
! it is done only to make sure that POTPAR by !
! this routine and that this routine is called !
! by main texttt{properly}.par
! Here We have done the 2nd order calculation : !
! $$E=H+hoh$$
!*******************************************************!
use kinds, only: RDP,i3
use parameters
implicit none
integer(i3):: i,il,ikl,ifind,ii,j
integer(i3):: nfill,jfill,isfill,nrec
integer(i3):: kc,k,ienrp
integer(i3):: mfill,ikk
integer(i3),dimension(nasite,ntsite),intent(in)::map
real(RDP):: s1,sum1,sum2,sum3,e
real(RDP),dimension(lorbit,lorbit,0:nrsite),intent(in):: srl
real(RDP),dimension(lorbit),intent(in)::ap1,ap6,ap7,ap8,ap9, &
ap10,ap11,ap12,ap13
real(RDP), dimension(lorbit),intent(in):: p2,p3,p4
real(RDP), dimension(maxrec,ienum+1),intent(out):: ya,xa

real(RDP),dimension(nsite,lorbit)::psii,psij,psik,psim
real(RDP),dimension(maxrec)::alpha
real(RDP),dimension(0:maxrec)::beta
real(RDP),dimension(lorbit)::cons,ar

sum1=0.0d0
beta(0)=0.0d0
ifind=1

psii(:,:)=0.0d0
psij(:,:)=0.0d0
psik(:,:)=0.0d0
psim(:,:)=0.0d0

psij(1,il)=1.0d0
nfill=1

open(15,file='coeff.dat',status='unknown')

! Starting Recursion loop(12 iteration)
lrec: do nrec=1,maxrec
jfill=nfill

lfill: do i=1,nfill

kc=map(i,ntsite)

!+++++++++++++++++++++++++++++++++++++++++++++++!
! HOP !
! For 2st order. hoh is included !
!+++++++++++++++++++++++++++++++++++++++++++++++!

if1: if (kc==1) then
do ikl=1,lorbit
s1=(ap6(ikl)*psij(i,ikl))
psim(i,ikl)=psim(i,ikl)+s1
end do

do ii=1,lorbit
cons(ii)=psij(i,ii)
end do

do j=1,nrsite
k=map(i,j)
if(k>nsite) cycle
if(k>ifind) ifind=k

call matp(srl,cons,j,ar)

do ii=1,lorbit
psim(k,ii)=psim(k,ii)+ar(ii)
enddo
end do
cycle lfill
endif if1

!-----------------------------------------------!
! If real space part is = 1 then !
!-----------------------------------------------!

do ikl=1,lorbit
s1=ap7(ikl)*psij(i,ikl)
psim(i,ikl)=psim(i,ikl)+s1
end do

do ii=1,lorbit
cons(ii)=psij(i,ii)
end do

do j=1,nrsite
k=map(i,j)
if(k>nsite) cycle
if(k>ifind) ifind=k
call matp(srl,cons,j,ar)
do ii=1,lorbit
psim(k,ii)=psim(k,ii)+ar(ii)
enddo
end do

k=map(i,ntsite-2)
if (k>nsite) cycle lfill
if (k>ifind)ifind=k

do ikl=1,lorbit
s1=0.0d0
s1=ap8(ikl)*psij(i,ikl)
psim(k,ikl)=psim(k,ikl)+s1
end do

k=map(i,ntsite-1)
if(k==0)cycle lfill

do ikl=1,lorbit
s1=0.0d0
s1=ap9(ikl)*psij(i,ikl)
psim(i,ikl)=psim(i,ikl)+s1
end do
end do lfill
!---------H operation finished--------!

!---------O-operation Starts----------!

nfill=ifind

if(nfill>nasite) nfill=nasite

lnfill: do i=1,nfill

kc=map(i,ntsite)

if(kc==1)then

do ikl=1,lorbit
s1=0.0d0
s1=ap10(ikl)*psim(i,ikl)
psik(i,ikl)=psik(i,ikl)+s1
enddo
cycle lnfill
endif
!-------------------------------------------------
do ikl=1,lorbit
s1=ap11(ikl)*psim(i,ikl)
psik(i,ikl)=psik(i,ikl)+s1
enddo

k=map(i,ntsite-2)
if(k.gt.nsite) cycle lnfill
if(k.gt.ifind)ifind=k

do ikl=1,lorbit
s1=0.0d0
s1=ap12(ikl)*psim(i,ikl)
psik(k,ikl)=psik(k,ikl)+s1
enddo

k=map(i,ntsite-1)

if(k==0)cycle lnfill
do ikl=1,lorbit
s1=0.0d0

s1=ap13(ikl)*psim(i,ikl)
psik(i,ikl)=psik(i,ikl)+s1

enddo

enddo lnfill
!------O-operation complete------------!
psim=0.0d0

!******* again h-operation *************
isfill=ifind

if(isfill>nasite)isfill=nasite
lisfill:do i=1,isfill

kc=map(i,ntsite)

lkc: if(kc==1)then

do ikl=1,lorbit
s1=0.0d0
s1=ap6(ikl)*psik(i,ikl)
psim(i,ikl)=psim(i,ikl)+s1
enddo

do ii=1,lorbit
cons(ii)=psik(i,ii)
enddo

do j=1,nrsite
k=map(i,j)
if(k>nsite)cycle
if(k>ifind)ifind=k

call matp(srl,cons,j,ar)

do ii=1,lorbit
psim(k,ii)=psim(k,ii)+ar(ii)
enddo

enddo

cycle lisfill
endif lkc

!********************************************

do ikl=1,lorbit
s1=0.0d0
s1=ap7(ikl)*psik(i,ikl)
psim(i,ikl)=psim(i,ikl)+s1
enddo

do ii=1,lorbit
cons(ii)=psik(i,ii)
enddo

do j=1,nrsite

k=map(i,j)

if(k>nsite)cycle

if(k>ifind)ifind=k

call matp(srl,cons,j,ar)

do ii=1,lorbit
psim(k,ii)=psim(k,ii)+ar(ii)
enddo

enddo

k=map(i,ntsite-2)
if(k.gt.nsite)cycle lisfill
if(k.gt.ifind)ifind=k

do ikl=1,lorbit
s1=0.0d0
s1=ap8(ikl)*psik(i,ikl)
psim(k,ikl)=psim(k,ikl)+s1
enddo

k=map(i,ntsite-1)

if(k==0)cycle lisfill
do ikl=1,lorbit
s1=0.0d0
s1=ap9(ikl)*psik(i,ikl)
psim(i,ikl)=psim(i,ikl)+s1

enddo

enddo lisfill
!*********** hoh complete*************
psik=0.0d0
!******** 2nd h-operatipn complete************** hoh complete

ljfill: do i=1,jfill
kc=map(i,ntsite)
lifn: if (kc==1) then
do ikl=1,lorbit
s1=ap1(ikl)*psij(i,ikl)
psik(i,ikl)=psik(i,ikl)+s1
end do

do ii=1,lorbit
cons(ii)=psij(i,ii)
end do

nrs: do j=1,nrsite
k=map(i,j)
if (k>nsite) cycle
if(k>ifind) ifind=k

call matp(srl,cons,j,ar)

do ii=1,lorbit
psik(k,ii)=psik(k,ii)+ar(ii)
end do

end do nrs
cycle ljfill
endif lifn

do ikl=1,lorbit
s1=p2(ikl)*psij(i,ikl)
psik(i,ikl)=psik(i,ikl)+s1
end do
do ii=1,lorbit
cons(ii)=psij(i,ii)
end do

do j=1,nrsite
k=map(i,j)
if (k>nsite) cycle
if (k>ifind)ifind=k

call matp(srl,cons,j,ar)

do ii=1,lorbit
psik(k,ii)=psik(k,ii)+ar(ii)
end do

end do

k=map(i,ntsite-2)
if (k>nsite) cycle
if (k>ifind) ifind=k

do ikl=1,lorbit
s1=p4(ikl)*psij(i,ikl)
psik(k,ikl)=psik(k,ikl)+s1
end do

k=map(i,ntsite-1)
if (k==0) cycle
do ikl=1,lorbit
s1=p3(ikl)*psij(i,ikl)
psik(i,ikl)=psik(i,ikl)+s1
end do
end do ljfill

do ikk=1,ifind
do ikl=1,lorbit
psik(ikk,ikl)=psik(ikk,ikl)-psim(ikk,ikl)
end do
end do

!===============================================!
! calculating $alpha$ & $beta$ !
! may be done as a subroutine !
!===============================================!

mfill=ifind
sum1=0.0d0
do i=1,jfill
do j=1,lorbit
sum1=sum1+psij(i,j)*psij(i,j)
end do
end do

sum2=0.0d0
do i=1,mfill
do j=1,lorbit
sum2=sum2+psik(i,j)*psij(i,j)
end do
end do

alpha(nrec)=sum2/sum1
write(*,*)il,"=>", loc(alpha(nrec))
do i=1,mfill
do j=1,lorbit
psik(i,j)=psik(i,j)-alpha(nrec)*psij(i,j)-beta(nrec-1)*psii(i,j)
end do
end do

sum3=0.0d0
do i=1,mfill
do j=1,lorbit
sum3=sum3+psik(i,j)*psik(i,j)
end do
end do

beta(nrec)=sum3/sum1

do i=1,nsite
do j=1,lorbit
psii(i,j)=psij(i,j)
psij(i,j)=psik(i,j)
psik(i,j)=0.0d0
psim(i,j)=0.0d0
end do
end do

nfill=mfill
if (mfill>nasite)nfill=nasite
ifind=1
xa(nrec,ienrp)=alpha(nrec)
ya(nrec,ienrp)=beta(nrec)
write(15,*)nrec,alpha(nrec),beta(nrec)
end do lrec
close(15)

end subroutine hop

!-----------------------------------------------!
! Subroutine MATP !
! to multiply SRL*CONS !
subroutine matp(srl,cons,j,ar)
!-----------------------------------------------!
use kinds
use parameters
implicit double precision(a-h,o-z)

real(RDP),dimension(lorbit,lorbit,0:nrsite):: srl
real(RDP),dimension(lorbit):: cons,ar

do i=1,lorbit
su=0.0d0
do kl=1,lorbit
su=su+srl(i,kl,j)*cons(kl)
enddo
ar(i)=su
enddo

end subroutine

end module mhop

the parameter file
!***********************************************!
! This module declares the parameters used !
! in this code. !
!***********************************************!
module parameters

use kinds
PUBLIC
integer(i3),parameter:: &
lattice=221, &
idim=3, &!Lattice dimension
norb=3, &!Number of orbital calc.
nrsite=14, &!Nearest neighbour in real space
nasite=73109, &!Number of site in AS Map
ntype=2, &!Number of atom type
ntsite=nrsite+3,&!Dimension of AS Map
nsite=369902, &!Total number of n.n. in AS Map
maxrec=12, &!Number of recursion step
lorbit=9, &!Number of orbital(s+p+d)
nscf=1, &!SCF loop
imax=172, &
info=172, &
llmax=16, &
jl=7, &
ienum=7, &!Number of seeds for fitting
ienump=ienum+1, &
nkp=17, &!Number of points extrapolated
!for a and b
nfn=900, &
spn=2, &!1=>up spin;2=>down spin
npn=lorbit*ienump

real(RDP),parameter:: &
emin=-0.9d0, &!emax->emin=energy range
emax=1.00d0, &!
z_a=24.0d0, &!Atomic number of atom A
z_b=25.0d0, &!Atomic number of atom B
e_a=6.0d0, &!Valence electron atom A
e_b=7.0d0, &!Valence electron atom B
c_a=z_a-e_a, &!Core electron atom A
c_b=z_b-e_b, &!Core electron atom B
x=0.60_RDP, &!Conc. of atom type A
y=1.00d0-x, &!Conc. of atom type B
ALPF=0.005882d0,&
ALPS=0.34850d0, &!POTPAR for orb. S
ALPP=0.05303d0, &!POTPAR for orb. P
ALPD=0.01071d0, &!POTPAR for orb. D
ruban=0.10d0, &
amix=0.06d0, &!Mixing scheme parameter
filling=x*e_a+y*e_b
character(7)::ASMAP='BCC_MAP'
character(7)::CHK_A='CHECK_A',CHK_B='CHECK_B'
character(8)::STRUCTURE ='INFO_BCC'
end module parameters

jimdempseyatthecove · ‎01-14-2009

Roddur,

I haven't looked too deep at the code, but in looking at the code it appears that there is something you need to explain or correct prior to looking further.

The subroutine HOP is called from within a parallel region where each thread will enter with a different value foril (the parallel do loop variable)

On entry into HOP each (every) thread issues

open(15,file='coeff.dat',status='unknown')

and begins a loop iterating on lrec
At the end of HOP is the end of the loop iterating on lrec and before the end do is the WRITE to file 15. And then following the loop is the CLOSE

As written, you will have multiple threads creating, writing and closing this file under the assumption that it is the only thread performing this action.

I am not suggesting this is the correct way to do this but it would seem like you would want to use a file number based on the incomming il value (e.g. 14+il) and use different named output files (e.g. coeff01.dat, coeff02.dat, ...)

When you decide how to handle this then we can proceed looking further.

Jim Dempsey

roddur · ‎01-14-2009

yeah...u are right.

jimdempseyatthecove · ‎01-15-2009

roddur,

Although you could use files as suggested, a fastermethod would be to create an allocatable array (of size lorbit) containing a user defined type containing an allocatable array and potentially other information such as last element stored. The inclosed allocatable array is then allocated to the max number of possibledata elements (containing the record curently written). Each thread then populates the records contained within its own array of records. Then after you exit theparallel region in main, you open the file and write the ordered data set to the file. Note, your il variablewas not part of the original data set written to the file so you would have to determine ifil(orbit)is importantto your output data set (as il may also be derrived by the position of the record in the output data set).

Jim Dempsey