openmp vs. MIC speed issue

conor_p_ · ‎09-10-2014

Hello,

I am having issues with speed for a particular loop of my code. The openmp version with 8 threads is beating it every time on stampede. The way my example code below is organized is into three sections: one on the MIC with data transfer and memory allocation and no deallocation, one on the MIC with data transfer and no memory allocation, and one just using standard openMP. Although I am attaching four or so modules, PLEASE ONLY LOOK AT MC.F90. The other routines are just to set up the arrays for the loops in MC.f90. The code is compiled as follows

ifort -openmp -align array64byte global.f90 position.f90 modlist.f90 MC.f90 -O3 -o new.out

The times generated on the MIC were about 1.77 e-2 for the first case, 1.77 e-2 for the second case, and 1.4e-2 for the third case

The latency to use VTUNE on stampede is so bad that it renders it essentially useless. From what I could ascertain from tune is that I have some issue somewhere with threads having to wait around. Would someone who is good with VTUNE be willing to look at my loops in MC.f90 to see why my vectorized code runs slower on the MIC.

program lennard
  
  use ifport
  use position
  use global
  use modlist

  implicit none
  double precision :: potential, pressure
  integer :: criterium
  double precision :: energy,filter
  double precision :: minx,miny,minz
  double precision :: dx,dy,dz
  double precision :: dr,dr2,dri,dr2i,dr6i,dr12i
  double precision :: x1,y1,z1,x2,y2,z2
  integer :: l,i,j,k
  integer :: c1s,c1e,c2s,c2e
  integer :: T1,T2,T3,T4,clock_rate,clock_max
  integer :: count,cell_neigh
  integer :: num,tid,count1,count2,count3
  integer :: rep

  param%rcut = 2.50d0
  param%rcut2 = param%rcut**2
  param%dens = 1.0d0
  param%np   = 60000
  param%vol   = param%np/param%dens
  param%box   = param%vol**(1.0/3.0)
  param%hbox  = param%box/(2.0d0)

  allocate(x(param%np),y(param%np),z(param%np))



  call seed(10)
  !call srand(10)

  !--- initialize particles on a cubic grid
  call init_position('cubic grid', 'configuration.dat')
  
  !--- initialize list variables
  call init_list()
  call compute_cell_neighbors()
  call total_cell_sort_init(0)
  call build_r()
 

  
    
  
   do rep = 1,3
      print*,'beginning rep:',rep
      
      energy = 0.0d0
      
      
      count1 = 0

      !-----------------------------------------------------------------------------!
      ! case 1: MIC computation with data transfer                                  !
      !-----------------------------------------------------------------------------!
      call system_clock(T3,clock_rate,clock_max)

      !dir$ offload_transfer target(mic:0),&
      !dir$& in(x: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(y: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(z: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(min_x: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(min_y: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(min_z: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(start: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(end: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(cnum: alloc_if(.true.) free_if(.false.))

      !dir$ offload begin target(mic:0) nocopy(x,y,z,min_x,min_y,min_z,start,end,cnum) in(listvar) inout(energy)
      
      !$omp parallel do schedule(dynamic) reduction(+:energy),&
      !$omp& default(private) shared(min_x,min_y,min_z,start,end,cnum,x,y,z,listvar)
      do i = 0,listvar%ncellT-1
         c1s = start(i); c1e = end(i)
         
         
         do j = i*13,13*i+12
            
            cell_neigh = cnum(j)
            
            c2s = start(cell_neigh); c2e = end(cell_neigh)
            
            minx = min_x(j); miny = min_y(j) ; minz = min_z(j)
            
            
            do k= c1s,c1e
               x1 = x(k); y1 = y(k); z1 = z(k)
               
               do l = c2s,c2e
                  x2 = x(l); y2 = y(l); z2 = z(l)
                  
                  dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz
                  
                  dr2 = dx*dx + dy*dy + dz*dz
                  
                  filter = 0.0d0
                  if(dr2.lt.2.50d0)then
                     filter = 1.0d0
                  endif
                  dr = sqrt(dr2)
                  dri = 1.0d0/dr
                  dr2i = 1.0d0*dri*dri
                  dr6i = dr2i*dr2i*dr2i
                  dr12i = dr6i*dr6i
                  
                  energy = energy + 4.0d0*(dr12i-dr6i)*filter
                  
         
               enddo
            enddo
         enddo
      enddo
      !$omp end parallel do
      !dir$ end offload
      call system_clock(T4,clock_rate,clock_max)
      print*,'timing for MIC computation case 2:',real(T4-T3)/real(clock_rate)      
      print*,'energy case 2:',energy
      print*
      print*
      
   end do
   !-----------------------------------------------------------!
   ! case three: arrays with data transfer and no memory alloc   !
   !-----------------------------------------------------------!
   
   
   do rep = 1,3
      
      energy = 0.0d0
   
      print*,'beginning rep:',rep
      call system_clock(T3,clock_rate,clock_max)



      !dir$ offload_transfer target(mic:0),&
      !dir$& in(x: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(y: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(z: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(min_x: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(min_y: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(min_z: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(start: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(end: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(cnum: alloc_if(.false.) free_if(.false.))

      !dir$ offload begin target(mic:0) nocopy(x,y,z,min_x,min_y,min_z,start,end,cnum) in(listvar) inout(energy)

      !$omp parallel do schedule(dynamic) reduction(+:energy),&
      !$omp& default(private) shared(min_x,min_y,min_z,start,end,cnum,x,y,z,listvar)
      do i = 0,listvar%ncellT-1
         c1s = start(i); c1e = end(i)

         !do k = c1s,c1e
         !   x1 = x(k); y1 = y(k); z1 = z(k)
         
         do j = i*13,13*i+12
            
            cell_neigh = cnum(j)
            
            c2s = start(cell_neigh); c2e = end(cell_neigh)
            
            minx = min_x(j); miny = min_y(j) ; minz = min_z(j)
            
            
            do k= c1s,c1e
               x1 = x(k); y1 = y(k); z1 = z(k)
               
               do l = c2s,c2e
                  x2 = x(l); y2 = y(l); z2 = z(l)
                  
                  dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz
                  
                  dr2 = dx*dx + dy*dy + dz*dz
                  
                  filter = 0.0d0
                  if(dr2.lt.2.50d0)then
                     filter = 1.0d0
                  endif
                  dr = sqrt(dr2)
                  dri = 1.0d0/dr
                  dr2i = 1.0d0*dri*dri
                  dr6i = dr2i*dr2i*dr2i
                  dr12i = dr6i*dr6i
                  
                  energy = energy + 4.0d0*(dr12i-dr6i)*filter
                  
        
                  
               enddo
            enddo
         enddo
      enddo
      !$omp end parallel do
      !dir$ end offload
      
      call system_clock(T4,clock_rate,clock_max)
      print*,'time for MIC computation case 3 :',real(T4-T3)/real(clock_rate)      
      print*,'energy case 3:',energy
      print*
      print*
   enddo
   
   !-----------------------------------------------------------!
   ! case four: openmp                                         !
   !-----------------------------------------------------------!
   
   
   
   energy = 0.0d0
   
   call system_clock(T1,clock_rate,clock_max)
   
   !$omp parallel do schedule(dynamic) reduction(+:energy),&
   !$omp& default(private) shared(min_x,min_y,min_z,cnum,start,end,x,y,z,listvar)
   do i = 0,listvar%ncellT-1

     c1s = start(i); c1e = end(i)

     !do k = c1s,c1e
     !   x1 = x(k); y1 = y(k); z1 = z(k)
     do j = i*13,13*i+12
   
        cell_neigh = cnum(j)
     
        c2s = start(cell_neigh); c2e = end(cell_neigh)
        minx = min_x(j); miny = min_y(j) ; minz = min_z(j)
        
        do k= c1s,c1e
           x1 = x(k); y1 = y(k); z1 = z(k)
           
           do l = c2s,c2e
              x2 = x(l); y2 = y(l); z2 = z(l)
              
              dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz
              
              dr2 = dx*dx + dy*dy + dz*dz
                                                           
              filter = 0.0d0
              if(dr2.lt.2.0d0) filter =1.0d0
              dr = sqrt(dr2)
              dri = 1.0d0/dr
              dr2i = 1.0d0*dri*dri
              dr6i = dr2i*dr2i*dr2i
              dr12i = dr6i*dr6i
              
              energy = energy + 4.0d0*(dr12i-dr6i)*filter
              
        
           enddo
        enddo
     enddo
  enddo
  !$omp end parallel do
  call system_clock(T2,clock_rate,clock_max)

  print*,'time case 4 with filter:',real(T2-T1)/real(clock_rate)
  print*,'energy case 4:',energy
  print*
  print*





  stop 
end program lennard

module modlist

use global

implicit none

contains

!........................................!
! constructor for class list !
!........................................!
subroutine init_list()

implicit none
integer :: i
integer :: aver

print*,param%box,param%rcut

listvar%rn = (param%box/int(param%box/param%rcut))
listvar%ncellT = nint(param%vol/(listvar%rn**3))
listvar%ncellD = nint(listvar%ncellT**(1.0d0/3.0d0))
listvar%alloc_size = int(ceiling(listvar%ncellT*1.2))

print*,'--------------- INTIALIZING LISTS-----------------------------------'
print*,' rn :',listvar%rn
print*,' ncellT :',listvar%ncellT
print*,' ncellD :',listvar%ncellD
print*,' MAKE SURE NCELLD CUBED IS NCELLT'
print*,'--------------- END INTIALIZING LISTS-------------------------------'

!--- try these arrays
! attributes align:64::min_x,min_y,min_z,cnum
allocate(min_x(0:13*listvar%alloc_size),min_y(0:13*listvar%alloc_size),min_z(0:13*listvar%alloc_size))
allocate(cnum(0:13*listvar%alloc_size))

allocate(cl(0:listvar%alloc_size))
allocate(cv_total(13,0:listvar%alloc_size))
allocate(start(0:listvar%alloc_size),end(0:listvar%alloc_size))

!--- allocate temp variables used for cell sort
allocate(temp_num(0:listvar%alloc_size))

aver = cell_max()
aver = int(ceiling(1.4*aver))

print*,'AVERAGRE',aver
do i =0, listvar%alloc_size

allocate(cl(i)%loc_r(aver))
cl(i)%size = aver
enddo

!--- allocate atom cell construct
allocate(atom(param%np))

end subroutine init_list

!......................................................................!
! determines the neigh neighbors for total energy calculation !
!......................................................................!
subroutine neighcell_total()

implicit none
integer :: offsets(3,13)
integer :: in,i,count
integer :: ic,itel,icz,icy,icx
integer :: ix,iy,iz
integer :: iccx,iccy,iccz
integer :: x,y,z

!--- build offsets array
offsets(1,1)=1; offsets(2,1)=0; offsets(3,1) = 0
offsets(1,2)=1; offsets(2,2)=1; offsets(3,2) = 0
offsets(1,3)=1; offsets(2,3)=-1; offsets(3,3) = 0
offsets(1,4)=0; offsets(2,4)=-1; offsets(3,4) = 0

offsets(1,5)=1; offsets(2,5)=0; offsets(3,5) = 1
offsets(1,6)=1; offsets(2,6)=1; offsets(3,6) = 1
offsets(1,7)=1; offsets(2,7)=-1; offsets(3,7) = 1
offsets(1,8)=0; offsets(2,8)=-1; offsets(3,8) = 1

offsets(1,9)=1; offsets(2,9)=0; offsets(3,9) = -1
offsets(1,10)=1; offsets(2,10)=1; offsets(3,10) = -1
offsets(1,11)=1; offsets(2,11)=-1;offsets(3,11) = -1
offsets(1,12)=0; offsets(2,12)=-1;offsets(3,12) = -1

offsets(1,13) = 0; offsets(2,13)=0; offsets(3,13) = 1

count = 0
do ic = 0,listvar%ncellT-1

!--- obtain x,y,z coordinates of cell of interest: ic
!--- this is doing using integer arithmetic
icz = ic/(listvar%ncellD*listvar%ncellD)
icy = (ic - icz*listvar%ncellD*listvar%ncellD)/(listvar%ncellD)
icx = ic -icy*listvar%ncellD - icz*listvar%ncellD*listvar%ncellD

do i = 1,13


ix = offsets(1,i); iy = offsets(2,i); iz = offsets(3,i)

iccz = icz+iz

if(iccz.lt.0) then
z = iccz + listvar%ncellD
else if (iccz.ge.listvar%ncellD) then
z = iccz - listvar%ncellD
else
z = iccz
endif


iccy = icy+iy

if(iccy.lt.0) then
y = iccy + listvar%ncellD
else if (iccy.ge.listvar%ncellD) then
y = iccy - listvar%ncellD
else
y = iccy
endif


iccx = icx+ix
if(iccx.lt.0) then
x = iccx + listvar%ncellD
else if (iccx.ge.listvar%ncellD) then
x = iccx - listvar%ncellD
else
x = iccx
endif

in = (x+y*listvar%ncellD + z*listvar%ncellD*listvar%ncellD)

cnum(count) = in

cv_total(i,ic)%cnum = in

!--- store min im in cv array
if(iccx.lt.0) then
min_x(count) = param%box
else if (iccx.ge.listvar%ncellD) then
min_x(count) = -param%box
else
min_x(count) = 0.0d0
endif

if(iccy.lt.0) then
min_y(count) = param%box
else if (iccy.ge.listvar%ncellD) then
min_y(count) = -param%box
else
min_y(count) = 0.0d0
endif

if(iccz.lt.0) then
min_z(count) = param%box
else if (iccz.ge.listvar%ncellD) then
min_z(count) = -param%box
else
min_z(count) = 0.0d0
endif

count = count + 1
enddo

enddo

end subroutine neighcell_total

!........................................!
! compute cell interaction indicies !
! compute miniomage for each cell pair !
! compute cell coordinates for each cell
!........................................!
subroutine compute_cell_neighbors()
implicit none

call neighcell_total()

end subroutine compute_cell_neighbors

!........................................!
! determines cell number !
!........................................!
integer function cell(x,y,z)

implicit none
double precision :: x,y,z
integer :: ix,iy,iz

if(x.lt.param%box)then
ix = int(x/listvar%rn)
else
ix = listvar%ncellD-1
endif

if(y.lt.param%box)then
iy = int(y/listvar%rn)
else
iy = listvar%ncellD-1
endif

if(z.lt.param%box)then
iz = int(z/listvar%rn)
else
iz = listvar%ncellD-1
endif

cell = ix+iy*listvar%ncellD+iz*listvar%ncellD*listvar%ncellD

end function cell

!.....................................................................!
! finds cell with largest number of atoms for initial array allocation
!.....................................................................!
integer function cell_max()
implicit none
integer :: temp_cell(param%np)
integer :: i,icell

temp_num(:) = 0

cell_max = 0
do i = 1 ,param%np

icell = cell( x(i),y(i),z(i))

temp_cell(i) = icell
temp_num(icell) = temp_num(icell) + 1

if(temp_num(icell).gt.cell_max)then
cell_max = temp_num(icell)
endif

enddo

end function cell_max

!.....................................................................................!
! want to assign each atom to appropriate cell
! generate list of atoms in each cell for GONET algorithm
! do not want to deallocate and reallocate her
!......................................................................................!
subroutine total_cell_sort_init(step)
implicit none
integer :: step
integer :: i,icell
integer :: new_size

temp_num(:) = 0

do i = 1 ,param%np

icell = cell(x(i),y(i),z(i))

temp_num(icell) = temp_num(icell) + 1

!--- keep particles current cell
atom(i)%cell = icell
atom(i)%loc = temp_num(icell)
enddo

do i=0, listvar%ncellT - 1

!--- number of molecules in cell
cl(i)%num = temp_num(i)

!--- while sorting here, check to see if reallocation is needed
if(cl(i)%num.gt.cl(i)%size) then

new_size = int(ceiling(1.4*cl(i)%num))

deallocate(cl(i)%loc_r)
allocate(cl(i)%loc_r(new_size))
cl(i)%size = new_size

endif
enddo

do i= 1,param%np
!--- get cell for particle
icell = atom(i)%cell

!--- place particles in cell
cl(icell)%loc_r(atom(i)%loc)%x = x(i)
cl(icell)%loc_r(atom(i)%loc)%y = y(i)
cl(icell)%loc_r(atom(i)%loc)%z = z(i)

enddo

end subroutine total_cell_sort_init

subroutine build_r()
integer :: i,j,counter

counter = 0

do i = 0,listvar%ncellT-1

start(i) = counter + 1

do j = 1 ,cl(i)%num

counter = counter + 1

x(counter) = cl(i)%loc_r(j)%x
y(counter) = cl(i)%loc_r(j)%y
z(counter) = cl(i)%loc_r(j)%z

enddo
end(i) = counter
enddo
end subroutine build_r

end module modlist

module global
implicit none

!-----------------------------------------------------------------------------------!
! INPUT VARIABLES !
!-----------------------------------------------------------------------------------!
type input
double precision :: rcut,rcut2,rv,rv2,rcrv2,rcrv
double precision :: maxd,vmax
double precision :: kboltz
double precision :: vol,box,hbox,ibox
double precision :: dens
double precision :: temp,P,beta
double precision :: alpha
integer :: np
integer :: istart
integer :: mceq,mcsteps
integer :: nsample, nadjust
end type input

type nuc
double precision :: kn
integer :: numSweep,numStep
end type nuc
type(nuc) :: nucVar

type ainfo
double precision :: x,y,z
integer :: type
end type ainfo

!--- declare param variable
!dir$ attributes offload:mic::param
type(input) :: param

!------------------------------------------------------------------------------------!
! POSITION ARRAYS !
!------------------------------------------------------------------------------------!
!dir$ attributes offload:mic::start
integer,allocatable :: start(:)
!dir$ attributes offload:mic::end
integer,allocatable :: end(:)

!dir$ attributes offload:mic::min_x
double precision, allocatable :: min_x(:)
!dir$ attributes offload:mic::min_y
double precision, allocatable :: min_y(:)
!dir$ attributes offload:mic::min_z
double precision, allocatable :: min_z(:)

!dir$ attributes offload:mic::x
double precision, allocatable :: x(:)
!dir$ attributes offload:mic::y
double precision, allocatable :: y(:)
!dir$ attributes offload:mic::Z
double precision, allocatable :: z(:)

!--------------------------------------------------------------------------------!
!--------------------- LIST ARRAYS --------------------------------------------!
type atom_cell
!--- cell of atom
integer :: cell

!--- position in cell list
integer :: loc

end type atom_cell

type cell_data

!--- x,y,z coordinates of particles
type(ainfo),allocatable :: loc_r(:)

!--- total number of atoms in cell
integer :: num

!---size of array
integer :: size
end type cell_data

type vlist

!--- minimum image displacement convention
double precision :: min_z,min_y,min_x

!--- cnum is going to be the cell index being held
integer :: cnum

end type vlist

type vlist_SOA
double precision, allocatable :: min_x(:),min_y(:),min_z(:)
integer, allocatable :: cnum(:)
end type vlist_SOA

type list
!.....declare doubles and ints
double precision :: rn
integer :: ncellT
integer :: ncellD

!--- typically 1.4 * ncellT
integer :: alloc_size

integer :: cv_size
end type list

!--- try just arrays
!dir$ attributes offload:mic::cnum
integer, allocatable :: cnum(:)

!--- declare variable for above types
!dir$ attributes offload:mic:: listvar
type(list) :: listvar
type(atom_cell),allocatable :: atom(:)

!--- cell lists
!--- for each cell, contains the atoms present
!--- cell_list(i)%cell_members(j) is ith cell's jth member
type(cell_data), allocatable :: cl(:)

!--- cell verlet lists
!--- This is going to be interaction matrix for cell pairs
!--- Each index in matrix will contain verlet list
!--- index (i,j)%(a,:) will contain atoms in cell j within cuttoff from atom a in cell i
type(vlist), allocatable :: cv_total(:,:)

!---temp variables used for cell sorts
integer, allocatable :: temp_num(:)

end module global

module position

use ifport
use global

implicit none
integer:: numpd
double precision ::gsize
integer :: flag

contains

subroutine init_position(situation,file_name)

implicit none
integer:: counter,i,j,k
character(len=*) :: situation,file_name
double precision :: half,deltax,remainder,spot

print*,'from init position:',param%box,param%np
numpd = ceiling(param%np**(1.0d0/3.0d0))
gsize = param%box/(1.0d0*numpd)

print*,'----------------initializing positions ----------------------'

print*,' INITIALIZING PARTICLES ON CUBIC GRID'
print*,' GRID SPACING:',gsize
print*,' NUMBER PER DIMENSION',numpd
counter =1

do i=1,numpd
do j=1,numpd
do k=1,numpd

if(counter.le.param%np)then

x(counter) = (k-1)*gsize
y(counter) = (j-1)*gsize
z(counter) = (i-1)*gsize

endif
counter = counter+1

enddo
enddo
enddo

end subroutine init_position

end module position

conor_p_ · ‎09-10-2014

Ugh. Sorry I don't know why the code came out like that, and I don't see an edit post anywhere so I will repost the code

program lennard
  
  use ifport
  use position
  use global
  use modlist

  implicit none
  double precision :: potential, pressure
  integer :: criterium
  double precision :: energy,filter
  double precision :: minx,miny,minz
  double precision :: dx,dy,dz
  double precision :: dr,dr2,dri,dr2i,dr6i,dr12i
  double precision :: x1,y1,z1,x2,y2,z2
  integer :: l,i,j,k
  integer :: c1s,c1e,c2s,c2e
  integer :: T1,T2,T3,T4,clock_rate,clock_max
  integer :: count,cell_neigh
  integer :: num,tid,count1,count2,count3
  integer :: rep

  param%rcut = 2.50d0
  param%rcut2 = param%rcut**2
  param%dens = 1.0d0
  param%np   = 60000
  param%vol   = param%np/param%dens
  param%box   = param%vol**(1.0/3.0)
  param%hbox  = param%box/(2.0d0)

  allocate(x(param%np),y(param%np),z(param%np))



  call seed(10)
  !call srand(10)

  !--- initialize particles on a cubic grid
  call init_position('cubic grid', 'configuration.dat')
  
  !--- initialize list variables
  call init_list()
  call compute_cell_neighbors()
  call total_cell_sort_init(0)
  call build_r()
 

  
    
  
   do rep = 1,3
      print*,'beginning rep:',rep
      
      energy = 0.0d0
      
      
      count1 = 0

      !-----------------------------------------------------------------------------!
      ! case 1: MIC computation with data transfer                                  !
      !-----------------------------------------------------------------------------!
      call system_clock(T3,clock_rate,clock_max)

      !dir$ offload_transfer target(mic:0),&
      !dir$& in(x: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(y: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(z: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(min_x: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(min_y: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(min_z: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(start: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(end: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(cnum: alloc_if(.true.) free_if(.false.))

      !dir$ offload begin target(mic:0) nocopy(x,y,z,min_x,min_y,min_z,start,end,cnum) in(listvar) inout(energy)
      
      !$omp parallel do schedule(dynamic) reduction(+:energy),&
      !$omp& default(private) shared(min_x,min_y,min_z,start,end,cnum,x,y,z,listvar)
      do i = 0,listvar%ncellT-1
         c1s = start(i); c1e = end(i)
         
         
         do j = i*13,13*i+12
            
            cell_neigh = cnum(j)
            
            c2s = start(cell_neigh); c2e = end(cell_neigh)
            
            minx = min_x(j); miny = min_y(j) ; minz = min_z(j)
            
            
            do k= c1s,c1e
               x1 = x(k); y1 = y(k); z1 = z(k)
               
               do l = c2s,c2e
                  x2 = x(l); y2 = y(l); z2 = z(l)
                  
                  dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz
                  
                  dr2 = dx*dx + dy*dy + dz*dz
                  
                  filter = 0.0d0
                  if(dr2.lt.2.50d0)then
                     filter = 1.0d0
                  endif
                  dr = sqrt(dr2)
                  dri = 1.0d0/dr
                  dr2i = 1.0d0*dri*dri
                  dr6i = dr2i*dr2i*dr2i
                  dr12i = dr6i*dr6i
                  
                  energy = energy + 4.0d0*(dr12i-dr6i)*filter
                  
         
               enddo
            enddo
         enddo
      enddo
      !$omp end parallel do
      !dir$ end offload
      call system_clock(T4,clock_rate,clock_max)
      print*,'timing for MIC computation case 2:',real(T4-T3)/real(clock_rate)      
      print*,'energy case 2:',energy
      print*
      print*
      
   end do
   !-----------------------------------------------------------!
   ! case three: arrays with data transfer and no memory alloc   !
   !-----------------------------------------------------------!
   
   
   do rep = 1,3
      
      energy = 0.0d0
   
      print*,'beginning rep:',rep
      call system_clock(T3,clock_rate,clock_max)



      !dir$ offload_transfer target(mic:0),&
      !dir$& in(x: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(y: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(z: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(min_x: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(min_y: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(min_z: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(start: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(end: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(cnum: alloc_if(.false.) free_if(.false.))

      !dir$ offload begin target(mic:0) nocopy(x,y,z,min_x,min_y,min_z,start,end,cnum) in(listvar) inout(energy)

      !$omp parallel do schedule(dynamic) reduction(+:energy),&
      !$omp& default(private) shared(min_x,min_y,min_z,start,end,cnum,x,y,z,listvar)
      do i = 0,listvar%ncellT-1
         c1s = start(i); c1e = end(i)

         !do k = c1s,c1e
         !   x1 = x(k); y1 = y(k); z1 = z(k)
         
         do j = i*13,13*i+12
            
            cell_neigh = cnum(j)
            
            c2s = start(cell_neigh); c2e = end(cell_neigh)
            
            minx = min_x(j); miny = min_y(j) ; minz = min_z(j)
            
            
            do k= c1s,c1e
               x1 = x(k); y1 = y(k); z1 = z(k)
               
               do l = c2s,c2e
                  x2 = x(l); y2 = y(l); z2 = z(l)
                  
                  dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz
                  
                  dr2 = dx*dx + dy*dy + dz*dz
                  
                  filter = 0.0d0
                  if(dr2.lt.2.50d0)then
                     filter = 1.0d0
                  endif
                  dr = sqrt(dr2)
                  dri = 1.0d0/dr
                  dr2i = 1.0d0*dri*dri
                  dr6i = dr2i*dr2i*dr2i
                  dr12i = dr6i*dr6i
                  
                  energy = energy + 4.0d0*(dr12i-dr6i)*filter
                  
        
                  
               enddo
            enddo
         enddo
      enddo
      !$omp end parallel do
      !dir$ end offload
      
      call system_clock(T4,clock_rate,clock_max)
      print*,'time for MIC computation case 3 :',real(T4-T3)/real(clock_rate)      
      print*,'energy case 3:',energy
      print*
      print*
   enddo
   
   !-----------------------------------------------------------!
   ! case four: openmp                                         !
   !-----------------------------------------------------------!
   
   
   
   energy = 0.0d0
   
   call system_clock(T1,clock_rate,clock_max)
   
   !$omp parallel do schedule(dynamic) reduction(+:energy),&
   !$omp& default(private) shared(min_x,min_y,min_z,cnum,start,end,x,y,z,listvar)
   do i = 0,listvar%ncellT-1

     c1s = start(i); c1e = end(i)

     !do k = c1s,c1e
     !   x1 = x(k); y1 = y(k); z1 = z(k)
     do j = i*13,13*i+12
   
        cell_neigh = cnum(j)
     
        c2s = start(cell_neigh); c2e = end(cell_neigh)
        minx = min_x(j); miny = min_y(j) ; minz = min_z(j)
        
        do k= c1s,c1e
           x1 = x(k); y1 = y(k); z1 = z(k)
           
           do l = c2s,c2e
              x2 = x(l); y2 = y(l); z2 = z(l)
              
              dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz
              
              dr2 = dx*dx + dy*dy + dz*dz
                                                           
              filter = 0.0d0
              if(dr2.lt.2.0d0) filter =1.0d0
              dr = sqrt(dr2)
              dri = 1.0d0/dr
              dr2i = 1.0d0*dri*dri
              dr6i = dr2i*dr2i*dr2i
              dr12i = dr6i*dr6i
              
              energy = energy + 4.0d0*(dr12i-dr6i)*filter
              
        
           enddo
        enddo
     enddo
  enddo
  !$omp end parallel do
  call system_clock(T2,clock_rate,clock_max)

  print*,'time case 4 with filter:',real(T2-T1)/real(clock_rate)
  print*,'energy case 4:',energy
  print*
  print*

  stop 
end program lennard

module global
  implicit none



  !-----------------------------------------------------------------------------------!
  !                      INPUT VARIABLES                                              !
  !-----------------------------------------------------------------------------------!
  type input
     double precision :: rcut,rcut2,rv,rv2,rcrv2,rcrv
     double precision :: maxd,vmax
     double precision :: kboltz
     double precision :: vol,box,hbox,ibox
     double precision :: dens
     double precision :: temp,P,beta
     double precision :: alpha
     integer :: np
     integer :: istart
     integer :: mceq,mcsteps
     integer :: nsample, nadjust
  end type input
  
  type nuc
     double precision :: kn
     integer :: numSweep,numStep
  end type nuc
  type(nuc) :: nucVar


  type ainfo
     double precision :: x,y,z
     integer :: type
  end type ainfo


  !--- declare param variable
  !dir$ attributes offload:mic::param
  type(input) :: param








  !------------------------------------------------------------------------------------!
  !                        POSITION ARRAYS                                             !
  !------------------------------------------------------------------------------------!
  !dir$ attributes offload:mic::start
  integer,allocatable :: start(:)
  !dir$ attributes offload:mic::end
  integer,allocatable :: end(:)

  !dir$ attributes offload:mic::min_x
  double precision, allocatable :: min_x(:)
  !dir$ attributes offload:mic::min_y
  double precision, allocatable :: min_y(:)
  !dir$ attributes offload:mic::min_z
  double precision, allocatable :: min_z(:)
  
  !dir$ attributes offload:mic::x
  double precision, allocatable :: x(:)
  !dir$ attributes offload:mic::y
  double precision, allocatable :: y(:)
  !dir$ attributes offload:mic::Z
  double precision, allocatable :: z(:)


  !--------------------------------------------------------------------------------!
  !---------------------  LIST ARRAYS  --------------------------------------------!
  type atom_cell
     !--- cell of atom
     integer :: cell
     
     !--- position in cell list
     integer :: loc

  end type atom_cell


  type cell_data
     
     !--- x,y,z coordinates of particles
     type(ainfo),allocatable :: loc_r(:)

     !--- total number of atoms in cell
     integer :: num

     !---size of array
     integer :: size
  end type cell_data



  type vlist
    
    !--- minimum image displacement convention
     double precision :: min_z,min_y,min_x

     !--- cnum is going to be the cell index being held
     integer :: cnum
 
  end type vlist



  type vlist_SOA
     double precision, allocatable :: min_x(:),min_y(:),min_z(:)
     integer, allocatable :: cnum(:)
  end type vlist_SOA


  type list  
     !.....declare doubles and ints
     double precision :: rn
     integer :: ncellT
     integer :: ncellD

     !--- typically 1.4 * ncellT
     integer :: alloc_size
  
     integer :: cv_size
  end type list


  !--- try just arrays
  !dir$ attributes offload:mic::cnum
  integer, allocatable :: cnum(:)



  !--- declare variable for above types
  !dir$ attributes offload:mic:: listvar
  type(list) :: listvar
  type(atom_cell),allocatable  :: atom(:)
  
  !--- cell lists
  !--- for each cell, contains the atoms present
  !--- cell_list(i)%cell_members(j) is ith cell's jth member
  type(cell_data), allocatable :: cl(:)

  !--- cell verlet lists
  !--- This is going to be interaction matrix for cell pairs
  !--- Each index in matrix will contain verlet list
  !--- index (i,j)%(a,:) will contain atoms in cell j within cuttoff from atom a in cell i
  type(vlist), allocatable :: cv_total(:,:)



  !---temp variables used for cell sorts
  integer, allocatable :: temp_num(:)
  
end module global


module modlist

  use global

  implicit none

contains

  !........................................!
  !     constructor for class list         !
  !........................................!
  subroutine init_list()
    
    implicit none
    integer :: i
    integer :: aver
    
    print*,param%box,param%rcut
        
    listvar%rn = (param%box/int(param%box/param%rcut))
    listvar%ncellT = nint(param%vol/(listvar%rn**3))
    listvar%ncellD = nint(listvar%ncellT**(1.0d0/3.0d0))
    listvar%alloc_size = int(ceiling(listvar%ncellT*1.2))
    
    print*,'--------------- INTIALIZING LISTS-----------------------------------'
    print*,'           rn                :',listvar%rn
    print*,'           ncellT            :',listvar%ncellT
    print*,'           ncellD            :',listvar%ncellD
    print*,'           MAKE SURE NCELLD CUBED IS NCELLT'
    print*,'--------------- END INTIALIZING LISTS-------------------------------'
    

    !--- try these arrays
    ! attributes align:64::min_x,min_y,min_z,cnum
    allocate(min_x(0:13*listvar%alloc_size),min_y(0:13*listvar%alloc_size),min_z(0:13*listvar%alloc_size))
    allocate(cnum(0:13*listvar%alloc_size))





    allocate(cl(0:listvar%alloc_size))
    allocate(cv_total(13,0:listvar%alloc_size))
    allocate(start(0:listvar%alloc_size),end(0:listvar%alloc_size))


    !--- allocate temp variables used for cell sort
    allocate(temp_num(0:listvar%alloc_size))
   

    aver = cell_max()
    aver = int(ceiling(1.4*aver))

    print*,'AVERAGRE',aver
    do i =0, listvar%alloc_size

       allocate(cl(i)%loc_r(aver))
       cl(i)%size = aver
    enddo
    


    !--- allocate atom cell construct
    allocate(atom(param%np))


    
  end subroutine init_list

  


  !......................................................................!
  !     determines the neigh neighbors for total energy calculation      !
  !......................................................................!  
  subroutine  neighcell_total()
    
    implicit none
    integer :: offsets(3,13)
    integer :: in,i,count
    integer :: ic,itel,icz,icy,icx
    integer :: ix,iy,iz
    integer :: iccx,iccy,iccz
    integer :: x,y,z

    !--- build offsets array
    offsets(1,1)=1; offsets(2,1)=0;  offsets(3,1) = 0
    offsets(1,2)=1; offsets(2,2)=1;  offsets(3,2) = 0
    offsets(1,3)=1; offsets(2,3)=-1; offsets(3,3) = 0
    offsets(1,4)=0; offsets(2,4)=-1; offsets(3,4) = 0
    
    offsets(1,5)=1; offsets(2,5)=0;  offsets(3,5) = 1
    offsets(1,6)=1; offsets(2,6)=1;  offsets(3,6) = 1
    offsets(1,7)=1; offsets(2,7)=-1; offsets(3,7) = 1
    offsets(1,8)=0; offsets(2,8)=-1; offsets(3,8) = 1
    
    offsets(1,9)=1;  offsets(2,9)=0;  offsets(3,9) = -1
    offsets(1,10)=1; offsets(2,10)=1; offsets(3,10) = -1
    offsets(1,11)=1; offsets(2,11)=-1;offsets(3,11) = -1
    offsets(1,12)=0; offsets(2,12)=-1;offsets(3,12) = -1
    
    offsets(1,13) = 0; offsets(2,13)=0; offsets(3,13) = 1
    

    count = 0
    do ic = 0,listvar%ncellT-1
             
       
       !--- obtain x,y,z coordinates of cell of interest: ic
       !--- this is doing using integer arithmetic 
       icz = ic/(listvar%ncellD*listvar%ncellD)
       icy = (ic - icz*listvar%ncellD*listvar%ncellD)/(listvar%ncellD)
       icx = ic -icy*listvar%ncellD - icz*listvar%ncellD*listvar%ncellD
       
       
       
       do i = 1,13
          
          
          ix = offsets(1,i); iy = offsets(2,i); iz = offsets(3,i)
          
          iccz = icz+iz
          
          if(iccz.lt.0) then
             z = iccz + listvar%ncellD                              
          else if (iccz.ge.listvar%ncellD) then
             z = iccz - listvar%ncellD                              
          else
             z = iccz
          endif
          
          
          iccy = icy+iy
          
          if(iccy.lt.0) then
             y = iccy + listvar%ncellD                                            
          else if (iccy.ge.listvar%ncellD) then
             y = iccy - listvar%ncellD                                                                                 
          else
             y = iccy
          endif
          
          
          iccx = icx+ix
          if(iccx.lt.0) then
             x = iccx + listvar%ncellD             
          else if (iccx.ge.listvar%ncellD) then
             x = iccx - listvar%ncellD                      
          else
             x = iccx    
          endif
          
          in = (x+y*listvar%ncellD + z*listvar%ncellD*listvar%ncellD)   
          
          cnum(count) = in

         
          cv_total(i,ic)%cnum = in
          
          !--- store min im in cv array
          if(iccx.lt.0) then  
             min_x(count) = param%box                 
          else if (iccx.ge.listvar%ncellD) then
             min_x(count) = -param%box             
          else
             min_x(count) = 0.0d0        
          endif
          
          if(iccy.lt.0) then  
             min_y(count) = param%box                 
          else if (iccy.ge.listvar%ncellD) then
             min_y(count) = -param%box             
          else
             min_y(count) = 0.0d0        
          endif
          
          if(iccz.lt.0) then  
             min_z(count) = param%box                 
          else if (iccz.ge.listvar%ncellD) then
             min_z(count) = -param%box             
          else
             min_z(count) = 0.0d0        
          endif

          count = count + 1
       enddo
     
    enddo
    
  end subroutine neighcell_total
  

  !........................................!
  ! compute cell interaction indicies      !
  ! compute miniomage for each cell pair   !
  ! compute cell coordinates for each cell
  !........................................!
  subroutine compute_cell_neighbors()
    implicit none
       
    call neighcell_total()


  end subroutine compute_cell_neighbors
          


  !........................................!
  !     determines cell number             !
  !........................................!
  integer function cell(x,y,z)
    
    implicit none
    double precision :: x,y,z
    integer :: ix,iy,iz
    
    if(x.lt.param%box)then    
       ix = int(x/listvar%rn)
    else
       ix = listvar%ncellD-1
    endif

    if(y.lt.param%box)then    
       iy = int(y/listvar%rn)
    else
       iy = listvar%ncellD-1
    endif
    
    if(z.lt.param%box)then    
       iz = int(z/listvar%rn)
    else
       iz = listvar%ncellD-1
    endif

   
    cell = ix+iy*listvar%ncellD+iz*listvar%ncellD*listvar%ncellD
    
  end function cell
  
  
 


  !.....................................................................!
  ! finds cell with largest number of atoms for initial array allocation
  !.....................................................................!
  integer function cell_max()
    implicit none
    integer :: temp_cell(param%np)
    integer :: i,icell

    temp_num(:) = 0
  
    cell_max = 0
    do i = 1 ,param%np

       icell = cell( x(i),y(i),z(i))

       temp_cell(i) = icell
       temp_num(icell) = temp_num(icell) + 1     

       if(temp_num(icell).gt.cell_max)then
          cell_max = temp_num(icell)
       endif

    enddo

  end function cell_max



  !.....................................................................................!
  ! want to assign each atom to appropriate cell
  ! generate list of atoms in each cell for GONET algorithm
  ! do not want to deallocate and reallocate her
  !......................................................................................!
  subroutine total_cell_sort_init(step)
    implicit none
    integer :: step
    integer :: i,icell
    integer :: new_size

    temp_num(:) = 0
      
    do i = 1 ,param%np

       icell = cell(x(i),y(i),z(i))

       temp_num(icell) = temp_num(icell) + 1     

       !--- keep particles current cell
       atom(i)%cell = icell 
       atom(i)%loc  = temp_num(icell)       
    enddo



    do i=0, listvar%ncellT - 1

       !--- number of molecules in cell 
       cl(i)%num = temp_num(i)
       
       !--- while sorting here, check to see if reallocation is needed
       if(cl(i)%num.gt.cl(i)%size) then
          
          new_size = int(ceiling(1.4*cl(i)%num))
          

          deallocate(cl(i)%loc_r)
          allocate(cl(i)%loc_r(new_size))
          cl(i)%size = new_size
                   
       endif
    enddo

   

    do i= 1,param%np
       !--- get cell for particle
       icell = atom(i)%cell
        
       !--- place particles in cell
       cl(icell)%loc_r(atom(i)%loc)%x = x(i)   
       cl(icell)%loc_r(atom(i)%loc)%y = y(i)     
       cl(icell)%loc_r(atom(i)%loc)%z = z(i)     
  
    enddo


  end subroutine total_cell_sort_init

  subroutine build_r()
    integer :: i,j,counter

    counter = 0 

    do i = 0,listvar%ncellT-1  
 
       start(i) = counter + 1

       do j = 1 ,cl(i)%num
          
          counter = counter + 1
          
          x(counter)    = cl(i)%loc_r(j)%x
          y(counter)    = cl(i)%loc_r(j)%y
          z(counter)    = cl(i)%loc_r(j)%z

       enddo
       end(i) = counter
    enddo
  end subroutine build_r


     
end module modlist

module position

  use ifport
  use global

  implicit none
  integer:: numpd
  double precision ::gsize
  integer :: flag
 


contains

  subroutine init_position(situation,file_name)
    
    implicit none
    integer:: counter,i,j,k
    character(len=*) :: situation,file_name
    double precision :: half,deltax,remainder,spot

    print*,'from init position:',param%box,param%np
    numpd = ceiling(param%np**(1.0d0/3.0d0))
    gsize = param%box/(1.0d0*numpd)
    

    print*,'----------------initializing positions ----------------------'
       
       print*,'              INITIALIZING PARTICLES ON CUBIC GRID'
       print*,'              GRID SPACING:',gsize
       print*,'              NUMBER PER DIMENSION',numpd
       counter =1
       
       do i=1,numpd
          do j=1,numpd
             do k=1,numpd
                
                if(counter.le.param%np)then
                   
                   x(counter) = (k-1)*gsize
                   y(counter) = (j-1)*gsize
                   z(counter) = (i-1)*gsize
                   
                endif
                counter = counter+1
                
             enddo
          enddo
       enddo

         
  end subroutine init_position
  
end module position

Sunny_G_Intel · ‎09-11-2014

Hi Conor,

Before investigating further I would like to verify if I have the source code compiled and running correctly. I have the following output :

 from init position:   39.1486806904843            60000
 ----------------initializing positions----------------------
               INITIALIZING PARTICLES ON CUBIC GRID
               GRID SPACING:  0.978717017262106     
               NUMBER PER DIMENSION          40
   39.1486806904843        2.50000000000000     
 --------------- INTIALIZING LISTS-----------------------------------
            rn                :   2.60991204603228     
            ncellT            :        3375
            ncellD            :          15
            MAKE SURE NCELLD CUBED IS NCELLT
 --------------- END INTIALIZING LISTS-------------------------------
 AVERAGRE          38
 beginning rep:           1
 timing for MIC computation case 2:  0.8054000    
 energy case 2:  0.000000000000000E+000
 
 
 beginning rep:           2
 timing for MIC computation case 2:  2.0999999E-03
 energy case 2:  0.000000000000000E+000
 
 
 beginning rep:           3
 timing for MIC computation case 2:  1.8000000E-03
 energy case 2:  0.000000000000000E+000
 
 
 beginning rep:           1
 time for MIC computation case 3 :  2.0999999E-03
 energy case 3:  0.000000000000000E+000
 
 
 beginning rep:           2
 time for MIC computation case 3 :  1.8000000E-03
 energy case 3:  0.000000000000000E+000
 
 
 beginning rep:           3
 time for MIC computation case 3 :  2.0000001E-03
 energy case 3:  0.000000000000000E+000
 
 
 time case 4 with filter:  6.9100000E-02
 energy case 4:  0.000000000000000E+000

I am not sure if the values for energy should be zero for all cases. Am I doing something wrong.