- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone, I have a subroutine that serves as the force calculation in an n-body simulation shown below. There a couple of arrays that I will describe which will hopefully make the subroutine a little more clear.
1. array position. it contains positions of particles and atom types. it is of length number of particles np
type aos
real*4 :: x,y,z
integer :: type
end type aos
type(aos) :: position(np)
2. array neigh. it contains a list of pointers to particles that are within a certain distance of a given particle. it is of type integer.
3. array nnstart and nnfinish. nnstart(i) and nnfinish(i) gives the starting and ending locations for information for particle i in array neigh. If I am particle i, my neighbors are contained in the indices between nnstart(i) and nnfinish(i) in neigh. these are also integers
4. array q. array of real*4s. charges of given atom
5. lj1,lj2,lj3,lj4. these are all arrays of real*4s. they provide parameters for the atomic interactions.
6. array ff. this is the array that stores the forces for each particle. it is of the following type
type ff_data
real*4 :: x,y,z
end type ff_data
all these arrays I have as globals and are defined for the mic as follows
!dir$ attributes offload:mic::position
My code was compiled using the -align array64byte flag. However, the inclusion of the !dir$ vector aligned statement on line 43 to assist vectorization of the inner loop produces a seg fault. I am very curious as to why, despite being compiled using the align array64byte, my arrays are not aligned. Any suggestions?
My second question is that I have ran this subroutine (shown below using single precision) using both full single precision and full double precision. In the single precision case, all array components are real*4 vs double precision. However, I only see about a 20% speed up. Now given that my bandwidth should double for SP vs. DP, and that my vector registers should be doing twice as many calculations, I expect a factor of two speed up. Is there something wrong with this reasoning, or is there something going on with this code?
Finally, does anyone have any suggestions for my current nearest integer function on line 54? I was given it as a suggestion on this forum, but don't know if there are faster alternatives.
subroutine lj_cut_coul_dsf_nonewton_SP(step,printflag)
implicit none
real*4 :: force,forcelj,forcecoul
real*4 :: x1,y1,z1,x2,y2,z2
real*4 :: dx,dy,dz,dr,dr2,dr2i,dr6i,dr12i
double precision :: ffx,ffy,ffz
real*4 :: fudge_factor
real*4 :: qtmp,r,prefactor,erfcc,erfcd,t
real*4 :: boxdx,boxdy,boxdz
real*4 :: scalelj, scalecoul
integer :: i,j,l,step
integer :: itype,jtype,neigh
integer :: tid,num
integer :: offset
integer :: printflag
integer :: T1,T2,clock_rate,clock_max
call system_clock(T1,clock_rate,clock_max)
!dir$ offload begin target(mic:0) in(position: alloc_if(.false.),free_if(.false.)),&
!dir$ inout(ff: alloc_if(.false.), free_if(.false.)),&
!dir$ in(nlist: alloc_if(.false.) free_if(.false.)),&
!dir$ in(nnfirst: alloc_if(.false.) free_if(.false.)),&
!dir$ in(nnlast: alloc_if(.false.) free_if(.false.)),&
!dir$ in(q: alloc_if(.false.) free_if(.false.)),&
!dir$ nocopy(lj1: alloc_if(.false.) free_if(.false.)),&
!dir$ nocopy(lj2: alloc_if(.false.) free_if(.false.)),&
!dir$ nocopy(lj3: alloc_if(.false.) free_if(.false.)),&
!dir$ nocopy(lj4: alloc_if(.false.) free_if(.false.))
!$omp parallel do schedule(dynamic) reduction(+:potential,e_coul,ffx,ffy,ffz) default(private),&
!$omp& shared(box,ibox,rcut2,cut_coulsq,alpha,EWALD_P,MY_PIS,np,numAtomType),&
!$omp& shared(A1,A2,A3,A4,A5),&
!$omp& shared(e_shift,f_shift),&
!$omp& shared(position,ff,nlist,nnfirst,nnlast,q,lj1,lj2,lj3,lj4)
do i = 1 ,np
!---load data for particle i: positions, type, and charge
x1 = position(i)%x; y1 = position(i)%y; z1 = position(i)%z; itype = position(i)%type
qtmp = q(i)
ffx = 0.0d0; ffy = 0.0d0; ffz = 0.0d0
!dir$ vector aligned
!dir$ simd reduction(+:potential,e_coul,ffx,ffy,ffz)
!... loop over neighbors for particle i
do j= nnfirst(i),nnlast(i)
!---load neighboring particle index
neigh = nlist(j)
!---calculate distance between particles using a vectorizable nearest integer function
dx = x1-position(neigh)%x; dy = y1-position(neigh)%y; dz = z1-position(neigh)%z
boxdx = dx*ibox; boxdy = dy*ibox; boxdz = dz*ibox
boxdx = (boxdx+sign(1/(epsilon(boxdx)),boxdx)) -sign(1/epsilon(boxdx),dx)
boxdy = (boxdy+sign(1/(epsilon(boxdy)),boxdy)) -sign(1/epsilon(boxdy),dy)
boxdz = (boxdz+sign(1/(epsilon(boxdz)),boxdz)) -sign(1/epsilon(boxdz),dz)
dx = dx-box*boxdx; dy = dy-box*boxdy; dz = dz-box*boxdz
dr2 = dx*dx + dy*dy + dz*dz
scalelj = 0.0d0
if(dr2.lt.rcut2)scalelj = 1.0d0
dr2i = 1.0d0/dr2
dr6i = dr2i*dr2i*dr2i
offset = numAtomType*(itype-1)+jtype
forcelj = scalelj*dr6i*(lj1(offset)*dr6i-lj2(offset))
potential = potential + scalelj*dr6i*(dr6i*lj3(offset)-lj4(offset))
scalecoul = 0.0d0
if(dr2.lt.cut_coulsq)scalecoul = 1.0d0
r = sqrt(dr2)
prefactor = scalecoul*qtmp*q(neigh)/r
erfcd = exp(-alpha*alpha*r*r)
t = 1.0 / (1.0 + EWALD_P*alpha*r)
erfcc = t * (A1+t*(A2+t*(A3+t*(A4+t*A5)))) * erfcd
forcecoul = prefactor * (erfcc/r + 2.0*alpha/MY_PIS * erfcd +&
r*f_shift) * r
e_coul = e_coul + prefactor*(erfcc-r*e_shift-dr2*f_shift)
force = (forcecoul+forcelj)*dr2i
ffx = ffx + dx*force
ffy = ffy + dy*force
ffz = ffz + dz*force
enddo
ff(i)%x = ffx; ff(i)%y = ffy; ff(i)%z = ffz
enddo
!$omp end parallel do
!dir$ end offload
potential = 0.50d0*potential
e_coul = 0.50d0*e_coul
call system_clock(T2,clock_rate,clock_max)
if(printflag.eq.1)then
print*,'elapsed time w/o newton SP',real(T2-T1)/real(clock_rate)
print*,'calculated potentials:',potential*reps, e_coul*reps
endif
end subroutine lj_cut_coul_dsf_nonewton_SP
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is a duplicate of https://software.intel.com/en-us/forums/topic/541463. Please post all follow-ups to that thread.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page