- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am having issues with speed for a particular loop of my code. The openmp version with 8 threads is beating it every time on stampede. The way my example code below is organized is into three sections: one on the MIC with data transfer and memory allocation and no deallocation, one on the MIC with data transfer and no memory allocation, and one just using standard openMP. Although I am attaching four or so modules, PLEASE ONLY LOOK AT MC.F90. The other routines are just to set up the arrays for the loops in MC.f90. The code is compiled as follows
ifort -openmp -align array64byte global.f90 position.f90 modlist.f90 MC.f90 -O3 -o new.out
The times generated on the MIC were about 1.77 e-2 for the first case, 1.77 e-2 for the second case, and 1.4e-2 for the third case
The latency to use VTUNE on stampede is so bad that it renders it essentially useless. From what I could ascertain from tune is that I have some issue somewhere with threads having to wait around. Would someone who is good with VTUNE be willing to look at my loops in MC.f90 to see why my vectorized code runs slower on the MIC.
program lennard use ifport use position use global use modlist implicit none double precision :: potential, pressure integer :: criterium double precision :: energy,filter double precision :: minx,miny,minz double precision :: dx,dy,dz double precision :: dr,dr2,dri,dr2i,dr6i,dr12i double precision :: x1,y1,z1,x2,y2,z2 integer :: l,i,j,k integer :: c1s,c1e,c2s,c2e integer :: T1,T2,T3,T4,clock_rate,clock_max integer :: count,cell_neigh integer :: num,tid,count1,count2,count3 integer :: rep param%rcut = 2.50d0 param%rcut2 = param%rcut**2 param%dens = 1.0d0 param%np = 60000 param%vol = param%np/param%dens param%box = param%vol**(1.0/3.0) param%hbox = param%box/(2.0d0) allocate(x(param%np),y(param%np),z(param%np)) call seed(10) !call srand(10) !--- initialize particles on a cubic grid call init_position('cubic grid', 'configuration.dat') !--- initialize list variables call init_list() call compute_cell_neighbors() call total_cell_sort_init(0) call build_r() do rep = 1,3 print*,'beginning rep:',rep energy = 0.0d0 count1 = 0 !-----------------------------------------------------------------------------! ! case 1: MIC computation with data transfer ! !-----------------------------------------------------------------------------! call system_clock(T3,clock_rate,clock_max) !dir$ offload_transfer target(mic:0),& !dir$& in(x: alloc_if(.true.) free_if(.false.)),& !dir$& in(y: alloc_if(.true.) free_if(.false.)),& !dir$& in(z: alloc_if(.true.) free_if(.false.)),& !dir$& in(min_x: alloc_if(.true.) free_if(.false.)),& !dir$& in(min_y: alloc_if(.true.) free_if(.false.)),& !dir$& in(min_z: alloc_if(.true.) free_if(.false.)),& !dir$& in(start: alloc_if(.true.) free_if(.false.)),& !dir$& in(end: alloc_if(.true.) free_if(.false.)),& !dir$& in(cnum: alloc_if(.true.) free_if(.false.)) !dir$ offload begin target(mic:0) nocopy(x,y,z,min_x,min_y,min_z,start,end,cnum) in(listvar) inout(energy) !$omp parallel do schedule(dynamic) reduction(+:energy),& !$omp& default(private) shared(min_x,min_y,min_z,start,end,cnum,x,y,z,listvar) do i = 0,listvar%ncellT-1 c1s = start(i); c1e = end(i) do j = i*13,13*i+12 cell_neigh = cnum(j) c2s = start(cell_neigh); c2e = end(cell_neigh) minx = min_x(j); miny = min_y(j) ; minz = min_z(j) do k= c1s,c1e x1 = x(k); y1 = y(k); z1 = z(k) do l = c2s,c2e x2 = x(l); y2 = y(l); z2 = z(l) dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz dr2 = dx*dx + dy*dy + dz*dz filter = 0.0d0 if(dr2.lt.2.50d0)then filter = 1.0d0 endif dr = sqrt(dr2) dri = 1.0d0/dr dr2i = 1.0d0*dri*dri dr6i = dr2i*dr2i*dr2i dr12i = dr6i*dr6i energy = energy + 4.0d0*(dr12i-dr6i)*filter enddo enddo enddo enddo !$omp end parallel do !dir$ end offload call system_clock(T4,clock_rate,clock_max) print*,'timing for MIC computation case 2:',real(T4-T3)/real(clock_rate) print*,'energy case 2:',energy print* print* end do !-----------------------------------------------------------! ! case three: arrays with data transfer and no memory alloc ! !-----------------------------------------------------------! do rep = 1,3 energy = 0.0d0 print*,'beginning rep:',rep call system_clock(T3,clock_rate,clock_max) !dir$ offload_transfer target(mic:0),& !dir$& in(x: alloc_if(.false.) free_if(.false.)),& !dir$& in(y: alloc_if(.false.) free_if(.false.)),& !dir$& in(z: alloc_if(.false.) free_if(.false.)),& !dir$& in(min_x: alloc_if(.false.) free_if(.false.)),& !dir$& in(min_y: alloc_if(.false.) free_if(.false.)),& !dir$& in(min_z: alloc_if(.false.) free_if(.false.)),& !dir$& in(start: alloc_if(.false.) free_if(.false.)),& !dir$& in(end: alloc_if(.false.) free_if(.false.)),& !dir$& in(cnum: alloc_if(.false.) free_if(.false.)) !dir$ offload begin target(mic:0) nocopy(x,y,z,min_x,min_y,min_z,start,end,cnum) in(listvar) inout(energy) !$omp parallel do schedule(dynamic) reduction(+:energy),& !$omp& default(private) shared(min_x,min_y,min_z,start,end,cnum,x,y,z,listvar) do i = 0,listvar%ncellT-1 c1s = start(i); c1e = end(i) !do k = c1s,c1e ! x1 = x(k); y1 = y(k); z1 = z(k) do j = i*13,13*i+12 cell_neigh = cnum(j) c2s = start(cell_neigh); c2e = end(cell_neigh) minx = min_x(j); miny = min_y(j) ; minz = min_z(j) do k= c1s,c1e x1 = x(k); y1 = y(k); z1 = z(k) do l = c2s,c2e x2 = x(l); y2 = y(l); z2 = z(l) dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz dr2 = dx*dx + dy*dy + dz*dz filter = 0.0d0 if(dr2.lt.2.50d0)then filter = 1.0d0 endif dr = sqrt(dr2) dri = 1.0d0/dr dr2i = 1.0d0*dri*dri dr6i = dr2i*dr2i*dr2i dr12i = dr6i*dr6i energy = energy + 4.0d0*(dr12i-dr6i)*filter enddo enddo enddo enddo !$omp end parallel do !dir$ end offload call system_clock(T4,clock_rate,clock_max) print*,'time for MIC computation case 3 :',real(T4-T3)/real(clock_rate) print*,'energy case 3:',energy print* print* enddo !-----------------------------------------------------------! ! case four: openmp ! !-----------------------------------------------------------! energy = 0.0d0 call system_clock(T1,clock_rate,clock_max) !$omp parallel do schedule(dynamic) reduction(+:energy),& !$omp& default(private) shared(min_x,min_y,min_z,cnum,start,end,x,y,z,listvar) do i = 0,listvar%ncellT-1 c1s = start(i); c1e = end(i) !do k = c1s,c1e ! x1 = x(k); y1 = y(k); z1 = z(k) do j = i*13,13*i+12 cell_neigh = cnum(j) c2s = start(cell_neigh); c2e = end(cell_neigh) minx = min_x(j); miny = min_y(j) ; minz = min_z(j) do k= c1s,c1e x1 = x(k); y1 = y(k); z1 = z(k) do l = c2s,c2e x2 = x(l); y2 = y(l); z2 = z(l) dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz dr2 = dx*dx + dy*dy + dz*dz filter = 0.0d0 if(dr2.lt.2.0d0) filter =1.0d0 dr = sqrt(dr2) dri = 1.0d0/dr dr2i = 1.0d0*dri*dri dr6i = dr2i*dr2i*dr2i dr12i = dr6i*dr6i energy = energy + 4.0d0*(dr12i-dr6i)*filter enddo enddo enddo enddo !$omp end parallel do call system_clock(T2,clock_rate,clock_max) print*,'time case 4 with filter:',real(T2-T1)/real(clock_rate) print*,'energy case 4:',energy print* print* stop end program lennard
module modlist
use global
implicit none
contains
!........................................!
! constructor for class list !
!........................................!
subroutine init_list()
implicit none
integer :: i
integer :: aver
print*,param%box,param%rcut
listvar%rn = (param%box/int(param%box/param%rcut))
listvar%ncellT = nint(param%vol/(listvar%rn**3))
listvar%ncellD = nint(listvar%ncellT**(1.0d0/3.0d0))
listvar%alloc_size = int(ceiling(listvar%ncellT*1.2))
print*,'--------------- INTIALIZING LISTS-----------------------------------'
print*,' rn :',listvar%rn
print*,' ncellT :',listvar%ncellT
print*,' ncellD :',listvar%ncellD
print*,' MAKE SURE NCELLD CUBED IS NCELLT'
print*,'--------------- END INTIALIZING LISTS-------------------------------'
!--- try these arrays
! attributes align:64::min_x,min_y,min_z,cnum
allocate(min_x(0:13*listvar%alloc_size),min_y(0:13*listvar%alloc_size),min_z(0:13*listvar%alloc_size))
allocate(cnum(0:13*listvar%alloc_size))
allocate(cl(0:listvar%alloc_size))
allocate(cv_total(13,0:listvar%alloc_size))
allocate(start(0:listvar%alloc_size),end(0:listvar%alloc_size))
!--- allocate temp variables used for cell sort
allocate(temp_num(0:listvar%alloc_size))
aver = cell_max()
aver = int(ceiling(1.4*aver))
print*,'AVERAGRE',aver
do i =0, listvar%alloc_size
allocate(cl(i)%loc_r(aver))
cl(i)%size = aver
enddo
!--- allocate atom cell construct
allocate(atom(param%np))
end subroutine init_list
!......................................................................!
! determines the neigh neighbors for total energy calculation !
!......................................................................!
subroutine neighcell_total()
implicit none
integer :: offsets(3,13)
integer :: in,i,count
integer :: ic,itel,icz,icy,icx
integer :: ix,iy,iz
integer :: iccx,iccy,iccz
integer :: x,y,z
!--- build offsets array
offsets(1,1)=1; offsets(2,1)=0; offsets(3,1) = 0
offsets(1,2)=1; offsets(2,2)=1; offsets(3,2) = 0
offsets(1,3)=1; offsets(2,3)=-1; offsets(3,3) = 0
offsets(1,4)=0; offsets(2,4)=-1; offsets(3,4) = 0
offsets(1,5)=1; offsets(2,5)=0; offsets(3,5) = 1
offsets(1,6)=1; offsets(2,6)=1; offsets(3,6) = 1
offsets(1,7)=1; offsets(2,7)=-1; offsets(3,7) = 1
offsets(1,8)=0; offsets(2,8)=-1; offsets(3,8) = 1
offsets(1,9)=1; offsets(2,9)=0; offsets(3,9) = -1
offsets(1,10)=1; offsets(2,10)=1; offsets(3,10) = -1
offsets(1,11)=1; offsets(2,11)=-1;offsets(3,11) = -1
offsets(1,12)=0; offsets(2,12)=-1;offsets(3,12) = -1
offsets(1,13) = 0; offsets(2,13)=0; offsets(3,13) = 1
count = 0
do ic = 0,listvar%ncellT-1
!--- obtain x,y,z coordinates of cell of interest: ic
!--- this is doing using integer arithmetic
icz = ic/(listvar%ncellD*listvar%ncellD)
icy = (ic - icz*listvar%ncellD*listvar%ncellD)/(listvar%ncellD)
icx = ic -icy*listvar%ncellD - icz*listvar%ncellD*listvar%ncellD
do i = 1,13
ix = offsets(1,i); iy = offsets(2,i); iz = offsets(3,i)
iccz = icz+iz
if(iccz.lt.0) then
z = iccz + listvar%ncellD
else if (iccz.ge.listvar%ncellD) then
z = iccz - listvar%ncellD
else
z = iccz
endif
iccy = icy+iy
if(iccy.lt.0) then
y = iccy + listvar%ncellD
else if (iccy.ge.listvar%ncellD) then
y = iccy - listvar%ncellD
else
y = iccy
endif
iccx = icx+ix
if(iccx.lt.0) then
x = iccx + listvar%ncellD
else if (iccx.ge.listvar%ncellD) then
x = iccx - listvar%ncellD
else
x = iccx
endif
in = (x+y*listvar%ncellD + z*listvar%ncellD*listvar%ncellD)
cnum(count) = in
cv_total(i,ic)%cnum = in
!--- store min im in cv array
if(iccx.lt.0) then
min_x(count) = param%box
else if (iccx.ge.listvar%ncellD) then
min_x(count) = -param%box
else
min_x(count) = 0.0d0
endif
if(iccy.lt.0) then
min_y(count) = param%box
else if (iccy.ge.listvar%ncellD) then
min_y(count) = -param%box
else
min_y(count) = 0.0d0
endif
if(iccz.lt.0) then
min_z(count) = param%box
else if (iccz.ge.listvar%ncellD) then
min_z(count) = -param%box
else
min_z(count) = 0.0d0
endif
count = count + 1
enddo
enddo
end subroutine neighcell_total
!........................................!
! compute cell interaction indicies !
! compute miniomage for each cell pair !
! compute cell coordinates for each cell
!........................................!
subroutine compute_cell_neighbors()
implicit none
call neighcell_total()
end subroutine compute_cell_neighbors
!........................................!
! determines cell number !
!........................................!
integer function cell(x,y,z)
implicit none
double precision :: x,y,z
integer :: ix,iy,iz
if(x.lt.param%box)then
ix = int(x/listvar%rn)
else
ix = listvar%ncellD-1
endif
if(y.lt.param%box)then
iy = int(y/listvar%rn)
else
iy = listvar%ncellD-1
endif
if(z.lt.param%box)then
iz = int(z/listvar%rn)
else
iz = listvar%ncellD-1
endif
cell = ix+iy*listvar%ncellD+iz*listvar%ncellD*listvar%ncellD
end function cell
!.....................................................................!
! finds cell with largest number of atoms for initial array allocation
!.....................................................................!
integer function cell_max()
implicit none
integer :: temp_cell(param%np)
integer :: i,icell
temp_num(:) = 0
cell_max = 0
do i = 1 ,param%np
icell = cell( x(i),y(i),z(i))
temp_cell(i) = icell
temp_num(icell) = temp_num(icell) + 1
if(temp_num(icell).gt.cell_max)then
cell_max = temp_num(icell)
endif
enddo
end function cell_max
!.....................................................................................!
! want to assign each atom to appropriate cell
! generate list of atoms in each cell for GONET algorithm
! do not want to deallocate and reallocate her
!......................................................................................!
subroutine total_cell_sort_init(step)
implicit none
integer :: step
integer :: i,icell
integer :: new_size
temp_num(:) = 0
do i = 1 ,param%np
icell = cell(x(i),y(i),z(i))
temp_num(icell) = temp_num(icell) + 1
!--- keep particles current cell
atom(i)%cell = icell
atom(i)%loc = temp_num(icell)
enddo
do i=0, listvar%ncellT - 1
!--- number of molecules in cell
cl(i)%num = temp_num(i)
!--- while sorting here, check to see if reallocation is needed
if(cl(i)%num.gt.cl(i)%size) then
new_size = int(ceiling(1.4*cl(i)%num))
deallocate(cl(i)%loc_r)
allocate(cl(i)%loc_r(new_size))
cl(i)%size = new_size
endif
enddo
do i= 1,param%np
!--- get cell for particle
icell = atom(i)%cell
!--- place particles in cell
cl(icell)%loc_r(atom(i)%loc)%x = x(i)
cl(icell)%loc_r(atom(i)%loc)%y = y(i)
cl(icell)%loc_r(atom(i)%loc)%z = z(i)
enddo
end subroutine total_cell_sort_init
subroutine build_r()
integer :: i,j,counter
counter = 0
do i = 0,listvar%ncellT-1
start(i) = counter + 1
do j = 1 ,cl(i)%num
counter = counter + 1
x(counter) = cl(i)%loc_r(j)%x
y(counter) = cl(i)%loc_r(j)%y
z(counter) = cl(i)%loc_r(j)%z
enddo
end(i) = counter
enddo
end subroutine build_r
end module modlist
module global
implicit none
!-----------------------------------------------------------------------------------!
! INPUT VARIABLES !
!-----------------------------------------------------------------------------------!
type input
double precision :: rcut,rcut2,rv,rv2,rcrv2,rcrv
double precision :: maxd,vmax
double precision :: kboltz
double precision :: vol,box,hbox,ibox
double precision :: dens
double precision :: temp,P,beta
double precision :: alpha
integer :: np
integer :: istart
integer :: mceq,mcsteps
integer :: nsample, nadjust
end type input
type nuc
double precision :: kn
integer :: numSweep,numStep
end type nuc
type(nuc) :: nucVar
type ainfo
double precision :: x,y,z
integer :: type
end type ainfo
!--- declare param variable
!dir$ attributes offload:mic::param
type(input) :: param
!------------------------------------------------------------------------------------!
! POSITION ARRAYS !
!------------------------------------------------------------------------------------!
!dir$ attributes offload:mic::start
integer,allocatable :: start(:)
!dir$ attributes offload:mic::end
integer,allocatable :: end(:)
!dir$ attributes offload:mic::min_x
double precision, allocatable :: min_x(:)
!dir$ attributes offload:mic::min_y
double precision, allocatable :: min_y(:)
!dir$ attributes offload:mic::min_z
double precision, allocatable :: min_z(:)
!dir$ attributes offload:mic::x
double precision, allocatable :: x(:)
!dir$ attributes offload:mic::y
double precision, allocatable :: y(:)
!dir$ attributes offload:mic::Z
double precision, allocatable :: z(:)
!--------------------------------------------------------------------------------!
!--------------------- LIST ARRAYS --------------------------------------------!
type atom_cell
!--- cell of atom
integer :: cell
!--- position in cell list
integer :: loc
end type atom_cell
type cell_data
!--- x,y,z coordinates of particles
type(ainfo),allocatable :: loc_r(:)
!--- total number of atoms in cell
integer :: num
!---size of array
integer :: size
end type cell_data
type vlist
!--- minimum image displacement convention
double precision :: min_z,min_y,min_x
!--- cnum is going to be the cell index being held
integer :: cnum
end type vlist
type vlist_SOA
double precision, allocatable :: min_x(:),min_y(:),min_z(:)
integer, allocatable :: cnum(:)
end type vlist_SOA
type list
!.....declare doubles and ints
double precision :: rn
integer :: ncellT
integer :: ncellD
!--- typically 1.4 * ncellT
integer :: alloc_size
integer :: cv_size
end type list
!--- try just arrays
!dir$ attributes offload:mic::cnum
integer, allocatable :: cnum(:)
!--- declare variable for above types
!dir$ attributes offload:mic:: listvar
type(list) :: listvar
type(atom_cell),allocatable :: atom(:)
!--- cell lists
!--- for each cell, contains the atoms present
!--- cell_list(i)%cell_members(j) is ith cell's jth member
type(cell_data), allocatable :: cl(:)
!--- cell verlet lists
!--- This is going to be interaction matrix for cell pairs
!--- Each index in matrix will contain verlet list
!--- index (i,j)%(a,:) will contain atoms in cell j within cuttoff from atom a in cell i
type(vlist), allocatable :: cv_total(:,:)
!---temp variables used for cell sorts
integer, allocatable :: temp_num(:)
end module global
module position
use ifport
use global
implicit none
integer:: numpd
double precision ::gsize
integer :: flag
contains
subroutine init_position(situation,file_name)
implicit none
integer:: counter,i,j,k
character(len=*) :: situation,file_name
double precision :: half,deltax,remainder,spot
print*,'from init position:',param%box,param%np
numpd = ceiling(param%np**(1.0d0/3.0d0))
gsize = param%box/(1.0d0*numpd)
print*,'----------------initializing positions ----------------------'
print*,' INITIALIZING PARTICLES ON CUBIC GRID'
print*,' GRID SPACING:',gsize
print*,' NUMBER PER DIMENSION',numpd
counter =1
do i=1,numpd
do j=1,numpd
do k=1,numpd
if(counter.le.param%np)then
x(counter) = (k-1)*gsize
y(counter) = (j-1)*gsize
z(counter) = (i-1)*gsize
endif
counter = counter+1
enddo
enddo
enddo
end subroutine init_position
end module position
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ugh. Sorry I don't know why the code came out like that, and I don't see an edit post anywhere so I will repost the code
program lennard use ifport use position use global use modlist implicit none double precision :: potential, pressure integer :: criterium double precision :: energy,filter double precision :: minx,miny,minz double precision :: dx,dy,dz double precision :: dr,dr2,dri,dr2i,dr6i,dr12i double precision :: x1,y1,z1,x2,y2,z2 integer :: l,i,j,k integer :: c1s,c1e,c2s,c2e integer :: T1,T2,T3,T4,clock_rate,clock_max integer :: count,cell_neigh integer :: num,tid,count1,count2,count3 integer :: rep param%rcut = 2.50d0 param%rcut2 = param%rcut**2 param%dens = 1.0d0 param%np = 60000 param%vol = param%np/param%dens param%box = param%vol**(1.0/3.0) param%hbox = param%box/(2.0d0) allocate(x(param%np),y(param%np),z(param%np)) call seed(10) !call srand(10) !--- initialize particles on a cubic grid call init_position('cubic grid', 'configuration.dat') !--- initialize list variables call init_list() call compute_cell_neighbors() call total_cell_sort_init(0) call build_r() do rep = 1,3 print*,'beginning rep:',rep energy = 0.0d0 count1 = 0 !-----------------------------------------------------------------------------! ! case 1: MIC computation with data transfer ! !-----------------------------------------------------------------------------! call system_clock(T3,clock_rate,clock_max) !dir$ offload_transfer target(mic:0),& !dir$& in(x: alloc_if(.true.) free_if(.false.)),& !dir$& in(y: alloc_if(.true.) free_if(.false.)),& !dir$& in(z: alloc_if(.true.) free_if(.false.)),& !dir$& in(min_x: alloc_if(.true.) free_if(.false.)),& !dir$& in(min_y: alloc_if(.true.) free_if(.false.)),& !dir$& in(min_z: alloc_if(.true.) free_if(.false.)),& !dir$& in(start: alloc_if(.true.) free_if(.false.)),& !dir$& in(end: alloc_if(.true.) free_if(.false.)),& !dir$& in(cnum: alloc_if(.true.) free_if(.false.)) !dir$ offload begin target(mic:0) nocopy(x,y,z,min_x,min_y,min_z,start,end,cnum) in(listvar) inout(energy) !$omp parallel do schedule(dynamic) reduction(+:energy),& !$omp& default(private) shared(min_x,min_y,min_z,start,end,cnum,x,y,z,listvar) do i = 0,listvar%ncellT-1 c1s = start(i); c1e = end(i) do j = i*13,13*i+12 cell_neigh = cnum(j) c2s = start(cell_neigh); c2e = end(cell_neigh) minx = min_x(j); miny = min_y(j) ; minz = min_z(j) do k= c1s,c1e x1 = x(k); y1 = y(k); z1 = z(k) do l = c2s,c2e x2 = x(l); y2 = y(l); z2 = z(l) dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz dr2 = dx*dx + dy*dy + dz*dz filter = 0.0d0 if(dr2.lt.2.50d0)then filter = 1.0d0 endif dr = sqrt(dr2) dri = 1.0d0/dr dr2i = 1.0d0*dri*dri dr6i = dr2i*dr2i*dr2i dr12i = dr6i*dr6i energy = energy + 4.0d0*(dr12i-dr6i)*filter enddo enddo enddo enddo !$omp end parallel do !dir$ end offload call system_clock(T4,clock_rate,clock_max) print*,'timing for MIC computation case 2:',real(T4-T3)/real(clock_rate) print*,'energy case 2:',energy print* print* end do !-----------------------------------------------------------! ! case three: arrays with data transfer and no memory alloc ! !-----------------------------------------------------------! do rep = 1,3 energy = 0.0d0 print*,'beginning rep:',rep call system_clock(T3,clock_rate,clock_max) !dir$ offload_transfer target(mic:0),& !dir$& in(x: alloc_if(.false.) free_if(.false.)),& !dir$& in(y: alloc_if(.false.) free_if(.false.)),& !dir$& in(z: alloc_if(.false.) free_if(.false.)),& !dir$& in(min_x: alloc_if(.false.) free_if(.false.)),& !dir$& in(min_y: alloc_if(.false.) free_if(.false.)),& !dir$& in(min_z: alloc_if(.false.) free_if(.false.)),& !dir$& in(start: alloc_if(.false.) free_if(.false.)),& !dir$& in(end: alloc_if(.false.) free_if(.false.)),& !dir$& in(cnum: alloc_if(.false.) free_if(.false.)) !dir$ offload begin target(mic:0) nocopy(x,y,z,min_x,min_y,min_z,start,end,cnum) in(listvar) inout(energy) !$omp parallel do schedule(dynamic) reduction(+:energy),& !$omp& default(private) shared(min_x,min_y,min_z,start,end,cnum,x,y,z,listvar) do i = 0,listvar%ncellT-1 c1s = start(i); c1e = end(i) !do k = c1s,c1e ! x1 = x(k); y1 = y(k); z1 = z(k) do j = i*13,13*i+12 cell_neigh = cnum(j) c2s = start(cell_neigh); c2e = end(cell_neigh) minx = min_x(j); miny = min_y(j) ; minz = min_z(j) do k= c1s,c1e x1 = x(k); y1 = y(k); z1 = z(k) do l = c2s,c2e x2 = x(l); y2 = y(l); z2 = z(l) dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz dr2 = dx*dx + dy*dy + dz*dz filter = 0.0d0 if(dr2.lt.2.50d0)then filter = 1.0d0 endif dr = sqrt(dr2) dri = 1.0d0/dr dr2i = 1.0d0*dri*dri dr6i = dr2i*dr2i*dr2i dr12i = dr6i*dr6i energy = energy + 4.0d0*(dr12i-dr6i)*filter enddo enddo enddo enddo !$omp end parallel do !dir$ end offload call system_clock(T4,clock_rate,clock_max) print*,'time for MIC computation case 3 :',real(T4-T3)/real(clock_rate) print*,'energy case 3:',energy print* print* enddo !-----------------------------------------------------------! ! case four: openmp ! !-----------------------------------------------------------! energy = 0.0d0 call system_clock(T1,clock_rate,clock_max) !$omp parallel do schedule(dynamic) reduction(+:energy),& !$omp& default(private) shared(min_x,min_y,min_z,cnum,start,end,x,y,z,listvar) do i = 0,listvar%ncellT-1 c1s = start(i); c1e = end(i) !do k = c1s,c1e ! x1 = x(k); y1 = y(k); z1 = z(k) do j = i*13,13*i+12 cell_neigh = cnum(j) c2s = start(cell_neigh); c2e = end(cell_neigh) minx = min_x(j); miny = min_y(j) ; minz = min_z(j) do k= c1s,c1e x1 = x(k); y1 = y(k); z1 = z(k) do l = c2s,c2e x2 = x(l); y2 = y(l); z2 = z(l) dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz dr2 = dx*dx + dy*dy + dz*dz filter = 0.0d0 if(dr2.lt.2.0d0) filter =1.0d0 dr = sqrt(dr2) dri = 1.0d0/dr dr2i = 1.0d0*dri*dri dr6i = dr2i*dr2i*dr2i dr12i = dr6i*dr6i energy = energy + 4.0d0*(dr12i-dr6i)*filter enddo enddo enddo enddo !$omp end parallel do call system_clock(T2,clock_rate,clock_max) print*,'time case 4 with filter:',real(T2-T1)/real(clock_rate) print*,'energy case 4:',energy print* print* stop end program lennard module global implicit none !-----------------------------------------------------------------------------------! ! INPUT VARIABLES ! !-----------------------------------------------------------------------------------! type input double precision :: rcut,rcut2,rv,rv2,rcrv2,rcrv double precision :: maxd,vmax double precision :: kboltz double precision :: vol,box,hbox,ibox double precision :: dens double precision :: temp,P,beta double precision :: alpha integer :: np integer :: istart integer :: mceq,mcsteps integer :: nsample, nadjust end type input type nuc double precision :: kn integer :: numSweep,numStep end type nuc type(nuc) :: nucVar type ainfo double precision :: x,y,z integer :: type end type ainfo !--- declare param variable !dir$ attributes offload:mic::param type(input) :: param !------------------------------------------------------------------------------------! ! POSITION ARRAYS ! !------------------------------------------------------------------------------------! !dir$ attributes offload:mic::start integer,allocatable :: start(:) !dir$ attributes offload:mic::end integer,allocatable :: end(:) !dir$ attributes offload:mic::min_x double precision, allocatable :: min_x(:) !dir$ attributes offload:mic::min_y double precision, allocatable :: min_y(:) !dir$ attributes offload:mic::min_z double precision, allocatable :: min_z(:) !dir$ attributes offload:mic::x double precision, allocatable :: x(:) !dir$ attributes offload:mic::y double precision, allocatable :: y(:) !dir$ attributes offload:mic::Z double precision, allocatable :: z(:) !--------------------------------------------------------------------------------! !--------------------- LIST ARRAYS --------------------------------------------! type atom_cell !--- cell of atom integer :: cell !--- position in cell list integer :: loc end type atom_cell type cell_data !--- x,y,z coordinates of particles type(ainfo),allocatable :: loc_r(:) !--- total number of atoms in cell integer :: num !---size of array integer :: size end type cell_data type vlist !--- minimum image displacement convention double precision :: min_z,min_y,min_x !--- cnum is going to be the cell index being held integer :: cnum end type vlist type vlist_SOA double precision, allocatable :: min_x(:),min_y(:),min_z(:) integer, allocatable :: cnum(:) end type vlist_SOA type list !.....declare doubles and ints double precision :: rn integer :: ncellT integer :: ncellD !--- typically 1.4 * ncellT integer :: alloc_size integer :: cv_size end type list !--- try just arrays !dir$ attributes offload:mic::cnum integer, allocatable :: cnum(:) !--- declare variable for above types !dir$ attributes offload:mic:: listvar type(list) :: listvar type(atom_cell),allocatable :: atom(:) !--- cell lists !--- for each cell, contains the atoms present !--- cell_list(i)%cell_members(j) is ith cell's jth member type(cell_data), allocatable :: cl(:) !--- cell verlet lists !--- This is going to be interaction matrix for cell pairs !--- Each index in matrix will contain verlet list !--- index (i,j)%(a,:) will contain atoms in cell j within cuttoff from atom a in cell i type(vlist), allocatable :: cv_total(:,:) !---temp variables used for cell sorts integer, allocatable :: temp_num(:) end module global module modlist use global implicit none contains !........................................! ! constructor for class list ! !........................................! subroutine init_list() implicit none integer :: i integer :: aver print*,param%box,param%rcut listvar%rn = (param%box/int(param%box/param%rcut)) listvar%ncellT = nint(param%vol/(listvar%rn**3)) listvar%ncellD = nint(listvar%ncellT**(1.0d0/3.0d0)) listvar%alloc_size = int(ceiling(listvar%ncellT*1.2)) print*,'--------------- INTIALIZING LISTS-----------------------------------' print*,' rn :',listvar%rn print*,' ncellT :',listvar%ncellT print*,' ncellD :',listvar%ncellD print*,' MAKE SURE NCELLD CUBED IS NCELLT' print*,'--------------- END INTIALIZING LISTS-------------------------------' !--- try these arrays ! attributes align:64::min_x,min_y,min_z,cnum allocate(min_x(0:13*listvar%alloc_size),min_y(0:13*listvar%alloc_size),min_z(0:13*listvar%alloc_size)) allocate(cnum(0:13*listvar%alloc_size)) allocate(cl(0:listvar%alloc_size)) allocate(cv_total(13,0:listvar%alloc_size)) allocate(start(0:listvar%alloc_size),end(0:listvar%alloc_size)) !--- allocate temp variables used for cell sort allocate(temp_num(0:listvar%alloc_size)) aver = cell_max() aver = int(ceiling(1.4*aver)) print*,'AVERAGRE',aver do i =0, listvar%alloc_size allocate(cl(i)%loc_r(aver)) cl(i)%size = aver enddo !--- allocate atom cell construct allocate(atom(param%np)) end subroutine init_list !......................................................................! ! determines the neigh neighbors for total energy calculation ! !......................................................................! subroutine neighcell_total() implicit none integer :: offsets(3,13) integer :: in,i,count integer :: ic,itel,icz,icy,icx integer :: ix,iy,iz integer :: iccx,iccy,iccz integer :: x,y,z !--- build offsets array offsets(1,1)=1; offsets(2,1)=0; offsets(3,1) = 0 offsets(1,2)=1; offsets(2,2)=1; offsets(3,2) = 0 offsets(1,3)=1; offsets(2,3)=-1; offsets(3,3) = 0 offsets(1,4)=0; offsets(2,4)=-1; offsets(3,4) = 0 offsets(1,5)=1; offsets(2,5)=0; offsets(3,5) = 1 offsets(1,6)=1; offsets(2,6)=1; offsets(3,6) = 1 offsets(1,7)=1; offsets(2,7)=-1; offsets(3,7) = 1 offsets(1,8)=0; offsets(2,8)=-1; offsets(3,8) = 1 offsets(1,9)=1; offsets(2,9)=0; offsets(3,9) = -1 offsets(1,10)=1; offsets(2,10)=1; offsets(3,10) = -1 offsets(1,11)=1; offsets(2,11)=-1;offsets(3,11) = -1 offsets(1,12)=0; offsets(2,12)=-1;offsets(3,12) = -1 offsets(1,13) = 0; offsets(2,13)=0; offsets(3,13) = 1 count = 0 do ic = 0,listvar%ncellT-1 !--- obtain x,y,z coordinates of cell of interest: ic !--- this is doing using integer arithmetic icz = ic/(listvar%ncellD*listvar%ncellD) icy = (ic - icz*listvar%ncellD*listvar%ncellD)/(listvar%ncellD) icx = ic -icy*listvar%ncellD - icz*listvar%ncellD*listvar%ncellD do i = 1,13 ix = offsets(1,i); iy = offsets(2,i); iz = offsets(3,i) iccz = icz+iz if(iccz.lt.0) then z = iccz + listvar%ncellD else if (iccz.ge.listvar%ncellD) then z = iccz - listvar%ncellD else z = iccz endif iccy = icy+iy if(iccy.lt.0) then y = iccy + listvar%ncellD else if (iccy.ge.listvar%ncellD) then y = iccy - listvar%ncellD else y = iccy endif iccx = icx+ix if(iccx.lt.0) then x = iccx + listvar%ncellD else if (iccx.ge.listvar%ncellD) then x = iccx - listvar%ncellD else x = iccx endif in = (x+y*listvar%ncellD + z*listvar%ncellD*listvar%ncellD) cnum(count) = in cv_total(i,ic)%cnum = in !--- store min im in cv array if(iccx.lt.0) then min_x(count) = param%box else if (iccx.ge.listvar%ncellD) then min_x(count) = -param%box else min_x(count) = 0.0d0 endif if(iccy.lt.0) then min_y(count) = param%box else if (iccy.ge.listvar%ncellD) then min_y(count) = -param%box else min_y(count) = 0.0d0 endif if(iccz.lt.0) then min_z(count) = param%box else if (iccz.ge.listvar%ncellD) then min_z(count) = -param%box else min_z(count) = 0.0d0 endif count = count + 1 enddo enddo end subroutine neighcell_total !........................................! ! compute cell interaction indicies ! ! compute miniomage for each cell pair ! ! compute cell coordinates for each cell !........................................! subroutine compute_cell_neighbors() implicit none call neighcell_total() end subroutine compute_cell_neighbors !........................................! ! determines cell number ! !........................................! integer function cell(x,y,z) implicit none double precision :: x,y,z integer :: ix,iy,iz if(x.lt.param%box)then ix = int(x/listvar%rn) else ix = listvar%ncellD-1 endif if(y.lt.param%box)then iy = int(y/listvar%rn) else iy = listvar%ncellD-1 endif if(z.lt.param%box)then iz = int(z/listvar%rn) else iz = listvar%ncellD-1 endif cell = ix+iy*listvar%ncellD+iz*listvar%ncellD*listvar%ncellD end function cell !.....................................................................! ! finds cell with largest number of atoms for initial array allocation !.....................................................................! integer function cell_max() implicit none integer :: temp_cell(param%np) integer :: i,icell temp_num(:) = 0 cell_max = 0 do i = 1 ,param%np icell = cell( x(i),y(i),z(i)) temp_cell(i) = icell temp_num(icell) = temp_num(icell) + 1 if(temp_num(icell).gt.cell_max)then cell_max = temp_num(icell) endif enddo end function cell_max !.....................................................................................! ! want to assign each atom to appropriate cell ! generate list of atoms in each cell for GONET algorithm ! do not want to deallocate and reallocate her !......................................................................................! subroutine total_cell_sort_init(step) implicit none integer :: step integer :: i,icell integer :: new_size temp_num(:) = 0 do i = 1 ,param%np icell = cell(x(i),y(i),z(i)) temp_num(icell) = temp_num(icell) + 1 !--- keep particles current cell atom(i)%cell = icell atom(i)%loc = temp_num(icell) enddo do i=0, listvar%ncellT - 1 !--- number of molecules in cell cl(i)%num = temp_num(i) !--- while sorting here, check to see if reallocation is needed if(cl(i)%num.gt.cl(i)%size) then new_size = int(ceiling(1.4*cl(i)%num)) deallocate(cl(i)%loc_r) allocate(cl(i)%loc_r(new_size)) cl(i)%size = new_size endif enddo do i= 1,param%np !--- get cell for particle icell = atom(i)%cell !--- place particles in cell cl(icell)%loc_r(atom(i)%loc)%x = x(i) cl(icell)%loc_r(atom(i)%loc)%y = y(i) cl(icell)%loc_r(atom(i)%loc)%z = z(i) enddo end subroutine total_cell_sort_init subroutine build_r() integer :: i,j,counter counter = 0 do i = 0,listvar%ncellT-1 start(i) = counter + 1 do j = 1 ,cl(i)%num counter = counter + 1 x(counter) = cl(i)%loc_r(j)%x y(counter) = cl(i)%loc_r(j)%y z(counter) = cl(i)%loc_r(j)%z enddo end(i) = counter enddo end subroutine build_r end module modlist module position use ifport use global implicit none integer:: numpd double precision ::gsize integer :: flag contains subroutine init_position(situation,file_name) implicit none integer:: counter,i,j,k character(len=*) :: situation,file_name double precision :: half,deltax,remainder,spot print*,'from init position:',param%box,param%np numpd = ceiling(param%np**(1.0d0/3.0d0)) gsize = param%box/(1.0d0*numpd) print*,'----------------initializing positions ----------------------' print*,' INITIALIZING PARTICLES ON CUBIC GRID' print*,' GRID SPACING:',gsize print*,' NUMBER PER DIMENSION',numpd counter =1 do i=1,numpd do j=1,numpd do k=1,numpd if(counter.le.param%np)then x(counter) = (k-1)*gsize y(counter) = (j-1)*gsize z(counter) = (i-1)*gsize endif counter = counter+1 enddo enddo enddo end subroutine init_position end module position
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Conor,
Before investigating further I would like to verify if I have the source code compiled and running correctly. I have the following output :
from init position: 39.1486806904843 60000 ----------------initializing positions---------------------- INITIALIZING PARTICLES ON CUBIC GRID GRID SPACING: 0.978717017262106 NUMBER PER DIMENSION 40 39.1486806904843 2.50000000000000 --------------- INTIALIZING LISTS----------------------------------- rn : 2.60991204603228 ncellT : 3375 ncellD : 15 MAKE SURE NCELLD CUBED IS NCELLT --------------- END INTIALIZING LISTS------------------------------- AVERAGRE 38 beginning rep: 1 timing for MIC computation case 2: 0.8054000 energy case 2: 0.000000000000000E+000 beginning rep: 2 timing for MIC computation case 2: 2.0999999E-03 energy case 2: 0.000000000000000E+000 beginning rep: 3 timing for MIC computation case 2: 1.8000000E-03 energy case 2: 0.000000000000000E+000 beginning rep: 1 time for MIC computation case 3 : 2.0999999E-03 energy case 3: 0.000000000000000E+000 beginning rep: 2 time for MIC computation case 3 : 1.8000000E-03 energy case 3: 0.000000000000000E+000 beginning rep: 3 time for MIC computation case 3 : 2.0000001E-03 energy case 3: 0.000000000000000E+000 time case 4 with filter: 6.9100000E-02 energy case 4: 0.000000000000000E+000
I am not sure if the values for energy should be zero for all cases. Am I doing something wrong.
![](/skins/images/872293744008A34B36F8ABF94A46CC66/responsive_peak/images/icon_anonymous_message.png)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page