Software Archive
Read-only legacy content
17061 Discussions

openmp vs. MIC speed issue

conor_p_
Beginner
374 Views

Hello,

I am having issues with speed for a particular loop of my code. The openmp version with 8 threads is beating it every time on stampede. The way my example code below is organized is into three sections: one on the MIC with data transfer and memory allocation and no deallocation, one on the MIC with data transfer and no memory allocation, and one just using standard openMP. Although I am attaching four or so modules, PLEASE ONLY LOOK AT MC.F90. The other routines are just to set up the arrays for the loops in MC.f90. The code is compiled as follows

ifort -openmp -align array64byte global.f90 position.f90 modlist.f90 MC.f90 -O3 -o new.out

The times generated on the MIC were about 1.77 e-2 for the first case, 1.77 e-2 for the second case, and 1.4e-2 for the third case

The latency to use VTUNE on stampede is so bad that it renders it essentially useless. From what I could ascertain from tune is that I have some issue somewhere with threads having to wait around. Would someone who is good with VTUNE be willing to look at my loops in MC.f90 to see why my vectorized code runs slower on the MIC.

program lennard
  
  use ifport
  use position
  use global
  use modlist

  implicit none
  double precision :: potential, pressure
  integer :: criterium
  double precision :: energy,filter
  double precision :: minx,miny,minz
  double precision :: dx,dy,dz
  double precision :: dr,dr2,dri,dr2i,dr6i,dr12i
  double precision :: x1,y1,z1,x2,y2,z2
  integer :: l,i,j,k
  integer :: c1s,c1e,c2s,c2e
  integer :: T1,T2,T3,T4,clock_rate,clock_max
  integer :: count,cell_neigh
  integer :: num,tid,count1,count2,count3
  integer :: rep

  param%rcut = 2.50d0
  param%rcut2 = param%rcut**2
  param%dens = 1.0d0
  param%np   = 60000
  param%vol   = param%np/param%dens
  param%box   = param%vol**(1.0/3.0)
  param%hbox  = param%box/(2.0d0)

  allocate(x(param%np),y(param%np),z(param%np))



  call seed(10)
  !call srand(10)

  !--- initialize particles on a cubic grid
  call init_position('cubic grid', 'configuration.dat')
  
  !--- initialize list variables
  call init_list()
  call compute_cell_neighbors()
  call total_cell_sort_init(0)
  call build_r()
 

  
    
  
   do rep = 1,3
      print*,'beginning rep:',rep
      
      energy = 0.0d0
      
      
      count1 = 0

      !-----------------------------------------------------------------------------!
      ! case 1: MIC computation with data transfer                                  !
      !-----------------------------------------------------------------------------!
      call system_clock(T3,clock_rate,clock_max)

      !dir$ offload_transfer target(mic:0),&
      !dir$& in(x: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(y: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(z: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(min_x: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(min_y: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(min_z: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(start: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(end: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(cnum: alloc_if(.true.) free_if(.false.))

      !dir$ offload begin target(mic:0) nocopy(x,y,z,min_x,min_y,min_z,start,end,cnum) in(listvar) inout(energy)
      
      !$omp parallel do schedule(dynamic) reduction(+:energy),&
      !$omp& default(private) shared(min_x,min_y,min_z,start,end,cnum,x,y,z,listvar)
      do i = 0,listvar%ncellT-1
         c1s = start(i); c1e = end(i)
         
         
         do j = i*13,13*i+12
            
            cell_neigh = cnum(j)
            
            c2s = start(cell_neigh); c2e = end(cell_neigh)
            
            minx = min_x(j); miny = min_y(j) ; minz = min_z(j)
            
            
            do k= c1s,c1e
               x1 = x(k); y1 = y(k); z1 = z(k)
               
               do l = c2s,c2e
                  x2 = x(l); y2 = y(l); z2 = z(l)
                  
                  dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz
                  
                  dr2 = dx*dx + dy*dy + dz*dz
                  
                  filter = 0.0d0
                  if(dr2.lt.2.50d0)then
                     filter = 1.0d0
                  endif
                  dr = sqrt(dr2)
                  dri = 1.0d0/dr
                  dr2i = 1.0d0*dri*dri
                  dr6i = dr2i*dr2i*dr2i
                  dr12i = dr6i*dr6i
                  
                  energy = energy + 4.0d0*(dr12i-dr6i)*filter
                  
         
               enddo
            enddo
         enddo
      enddo
      !$omp end parallel do
      !dir$ end offload
      call system_clock(T4,clock_rate,clock_max)
      print*,'timing for MIC computation case 2:',real(T4-T3)/real(clock_rate)      
      print*,'energy case 2:',energy
      print*
      print*
      
   end do
   !-----------------------------------------------------------!
   ! case three: arrays with data transfer and no memory alloc   !
   !-----------------------------------------------------------!
   
   
   do rep = 1,3
      
      energy = 0.0d0
   
      print*,'beginning rep:',rep
      call system_clock(T3,clock_rate,clock_max)



      !dir$ offload_transfer target(mic:0),&
      !dir$& in(x: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(y: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(z: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(min_x: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(min_y: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(min_z: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(start: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(end: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(cnum: alloc_if(.false.) free_if(.false.))

      !dir$ offload begin target(mic:0) nocopy(x,y,z,min_x,min_y,min_z,start,end,cnum) in(listvar) inout(energy)

      !$omp parallel do schedule(dynamic) reduction(+:energy),&
      !$omp& default(private) shared(min_x,min_y,min_z,start,end,cnum,x,y,z,listvar)
      do i = 0,listvar%ncellT-1
         c1s = start(i); c1e = end(i)

         !do k = c1s,c1e
         !   x1 = x(k); y1 = y(k); z1 = z(k)
         
         do j = i*13,13*i+12
            
            cell_neigh = cnum(j)
            
            c2s = start(cell_neigh); c2e = end(cell_neigh)
            
            minx = min_x(j); miny = min_y(j) ; minz = min_z(j)
            
            
            do k= c1s,c1e
               x1 = x(k); y1 = y(k); z1 = z(k)
               
               do l = c2s,c2e
                  x2 = x(l); y2 = y(l); z2 = z(l)
                  
                  dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz
                  
                  dr2 = dx*dx + dy*dy + dz*dz
                  
                  filter = 0.0d0
                  if(dr2.lt.2.50d0)then
                     filter = 1.0d0
                  endif
                  dr = sqrt(dr2)
                  dri = 1.0d0/dr
                  dr2i = 1.0d0*dri*dri
                  dr6i = dr2i*dr2i*dr2i
                  dr12i = dr6i*dr6i
                  
                  energy = energy + 4.0d0*(dr12i-dr6i)*filter
                  
        
                  
               enddo
            enddo
         enddo
      enddo
      !$omp end parallel do
      !dir$ end offload
      
      call system_clock(T4,clock_rate,clock_max)
      print*,'time for MIC computation case 3 :',real(T4-T3)/real(clock_rate)      
      print*,'energy case 3:',energy
      print*
      print*
   enddo
   
   !-----------------------------------------------------------!
   ! case four: openmp                                         !
   !-----------------------------------------------------------!
   
   
   
   energy = 0.0d0
   
   call system_clock(T1,clock_rate,clock_max)
   
   !$omp parallel do schedule(dynamic) reduction(+:energy),&
   !$omp& default(private) shared(min_x,min_y,min_z,cnum,start,end,x,y,z,listvar)
   do i = 0,listvar%ncellT-1

     c1s = start(i); c1e = end(i)

     !do k = c1s,c1e
     !   x1 = x(k); y1 = y(k); z1 = z(k)
     do j = i*13,13*i+12
   
        cell_neigh = cnum(j)
     
        c2s = start(cell_neigh); c2e = end(cell_neigh)
        minx = min_x(j); miny = min_y(j) ; minz = min_z(j)
        
        do k= c1s,c1e
           x1 = x(k); y1 = y(k); z1 = z(k)
           
           do l = c2s,c2e
              x2 = x(l); y2 = y(l); z2 = z(l)
              
              dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz
              
              dr2 = dx*dx + dy*dy + dz*dz
                                                           
              filter = 0.0d0
              if(dr2.lt.2.0d0) filter =1.0d0
              dr = sqrt(dr2)
              dri = 1.0d0/dr
              dr2i = 1.0d0*dri*dri
              dr6i = dr2i*dr2i*dr2i
              dr12i = dr6i*dr6i
              
              energy = energy + 4.0d0*(dr12i-dr6i)*filter
              
        
           enddo
        enddo
     enddo
  enddo
  !$omp end parallel do
  call system_clock(T2,clock_rate,clock_max)

  print*,'time case 4 with filter:',real(T2-T1)/real(clock_rate)
  print*,'energy case 4:',energy
  print*
  print*





  stop 
end program lennard

 

module modlist

  use global

  implicit none


  

contains

 

  !........................................!
  !     constructor for class list         !
  !........................................!
  subroutine init_list()
    
    implicit none
    integer :: i
    integer :: aver
    
    print*,param%box,param%rcut
        
    listvar%rn = (param%box/int(param%box/param%rcut))
    listvar%ncellT = nint(param%vol/(listvar%rn**3))
    listvar%ncellD = nint(listvar%ncellT**(1.0d0/3.0d0))
    listvar%alloc_size = int(ceiling(listvar%ncellT*1.2))
    
    print*,'--------------- INTIALIZING LISTS-----------------------------------'
    print*,'           rn                :',listvar%rn
    print*,'           ncellT            :',listvar%ncellT
    print*,'           ncellD            :',listvar%ncellD
    print*,'           MAKE SURE NCELLD CUBED IS NCELLT'
    print*,'--------------- END INTIALIZING LISTS-------------------------------'
    

    !--- try these arrays
    ! attributes align:64::min_x,min_y,min_z,cnum
    allocate(min_x(0:13*listvar%alloc_size),min_y(0:13*listvar%alloc_size),min_z(0:13*listvar%alloc_size))
    allocate(cnum(0:13*listvar%alloc_size))

 

 

    allocate(cl(0:listvar%alloc_size))
    allocate(cv_total(13,0:listvar%alloc_size))
    allocate(start(0:listvar%alloc_size),end(0:listvar%alloc_size))


    !--- allocate temp variables used for cell sort
    allocate(temp_num(0:listvar%alloc_size))
   

    aver = cell_max()
    aver = int(ceiling(1.4*aver))

    print*,'AVERAGRE',aver
    do i =0, listvar%alloc_size

       allocate(cl(i)%loc_r(aver))
       cl(i)%size = aver
    enddo
    


    !--- allocate atom cell construct
    allocate(atom(param%np))


    
  end subroutine init_list

  


  !......................................................................!
  !     determines the neigh neighbors for total energy calculation      !
  !......................................................................!  
  subroutine  neighcell_total()
    
    implicit none
    integer :: offsets(3,13)
    integer :: in,i,count
    integer :: ic,itel,icz,icy,icx
    integer :: ix,iy,iz
    integer :: iccx,iccy,iccz
    integer :: x,y,z

    !--- build offsets array
    offsets(1,1)=1; offsets(2,1)=0;  offsets(3,1) = 0
    offsets(1,2)=1; offsets(2,2)=1;  offsets(3,2) = 0
    offsets(1,3)=1; offsets(2,3)=-1; offsets(3,3) = 0
    offsets(1,4)=0; offsets(2,4)=-1; offsets(3,4) = 0
    
    offsets(1,5)=1; offsets(2,5)=0;  offsets(3,5) = 1
    offsets(1,6)=1; offsets(2,6)=1;  offsets(3,6) = 1
    offsets(1,7)=1; offsets(2,7)=-1; offsets(3,7) = 1
    offsets(1,8)=0; offsets(2,8)=-1; offsets(3,8) = 1
    
    offsets(1,9)=1;  offsets(2,9)=0;  offsets(3,9) = -1
    offsets(1,10)=1; offsets(2,10)=1; offsets(3,10) = -1
    offsets(1,11)=1; offsets(2,11)=-1;offsets(3,11) = -1
    offsets(1,12)=0; offsets(2,12)=-1;offsets(3,12) = -1
    
    offsets(1,13) = 0; offsets(2,13)=0; offsets(3,13) = 1
    

    count = 0
    do ic = 0,listvar%ncellT-1
             
       
       !--- obtain x,y,z coordinates of cell of interest: ic
       !--- this is doing using integer arithmetic 
       icz = ic/(listvar%ncellD*listvar%ncellD)
       icy = (ic - icz*listvar%ncellD*listvar%ncellD)/(listvar%ncellD)
       icx = ic -icy*listvar%ncellD - icz*listvar%ncellD*listvar%ncellD
       
       
       
       do i = 1,13
          
          
          ix = offsets(1,i); iy = offsets(2,i); iz = offsets(3,i)
          
          iccz = icz+iz
          
          if(iccz.lt.0) then
             z = iccz + listvar%ncellD                              
          else if (iccz.ge.listvar%ncellD) then
             z = iccz - listvar%ncellD                              
          else
             z = iccz
          endif
          
          
          iccy = icy+iy
          
          if(iccy.lt.0) then
             y = iccy + listvar%ncellD                                            
          else if (iccy.ge.listvar%ncellD) then
             y = iccy - listvar%ncellD                                                                                 
          else
             y = iccy
          endif
          
          
          iccx = icx+ix
          if(iccx.lt.0) then
             x = iccx + listvar%ncellD             
          else if (iccx.ge.listvar%ncellD) then
             x = iccx - listvar%ncellD                      
          else
             x = iccx    
          endif
          
          in = (x+y*listvar%ncellD + z*listvar%ncellD*listvar%ncellD)   
          
          cnum(count) = in

         
          cv_total(i,ic)%cnum = in
          
          !--- store min im in cv array
          if(iccx.lt.0) then  
             min_x(count) = param%box                 
          else if (iccx.ge.listvar%ncellD) then
             min_x(count) = -param%box             
          else
             min_x(count) = 0.0d0        
          endif
          
          if(iccy.lt.0) then  
             min_y(count) = param%box                 
          else if (iccy.ge.listvar%ncellD) then
             min_y(count) = -param%box             
          else
             min_y(count) = 0.0d0        
          endif
          
          if(iccz.lt.0) then  
             min_z(count) = param%box                 
          else if (iccz.ge.listvar%ncellD) then
             min_z(count) = -param%box             
          else
             min_z(count) = 0.0d0        
          endif

          count = count + 1
       enddo
     
    enddo
    
  end subroutine neighcell_total
  

  !........................................!
  ! compute cell interaction indicies      !
  ! compute miniomage for each cell pair   !
  ! compute cell coordinates for each cell
  !........................................!
  subroutine compute_cell_neighbors()
    implicit none
       
    call neighcell_total()


  end subroutine compute_cell_neighbors
          


  !........................................!
  !     determines cell number             !
  !........................................!
  integer function cell(x,y,z)
    
    implicit none
    double precision :: x,y,z
    integer :: ix,iy,iz
    
    if(x.lt.param%box)then    
       ix = int(x/listvar%rn)
    else
       ix = listvar%ncellD-1
    endif

    if(y.lt.param%box)then    
       iy = int(y/listvar%rn)
    else
       iy = listvar%ncellD-1
    endif
    
    if(z.lt.param%box)then    
       iz = int(z/listvar%rn)
    else
       iz = listvar%ncellD-1
    endif

   
    cell = ix+iy*listvar%ncellD+iz*listvar%ncellD*listvar%ncellD
    
  end function cell
  
  
 


  !.....................................................................!
  ! finds cell with largest number of atoms for initial array allocation
  !.....................................................................!
  integer function cell_max()
    implicit none
    integer :: temp_cell(param%np)
    integer :: i,icell

    temp_num(:) = 0
  
    cell_max = 0
    do i = 1 ,param%np

       icell = cell( x(i),y(i),z(i))

       temp_cell(i) = icell
       temp_num(icell) = temp_num(icell) + 1     

       if(temp_num(icell).gt.cell_max)then
          cell_max = temp_num(icell)
       endif

    enddo

  end function cell_max

 

  !.....................................................................................!
  ! want to assign each atom to appropriate cell
  ! generate list of atoms in each cell for GONET algorithm
  ! do not want to deallocate and reallocate her
  !......................................................................................!
  subroutine total_cell_sort_init(step)
    implicit none
    integer :: step
    integer :: i,icell
    integer :: new_size

    temp_num(:) = 0
      
    do i = 1 ,param%np

       icell = cell(x(i),y(i),z(i))

       temp_num(icell) = temp_num(icell) + 1     

       !--- keep particles current cell
       atom(i)%cell = icell 
       atom(i)%loc  = temp_num(icell)       
    enddo

 

    do i=0, listvar%ncellT - 1

       !--- number of molecules in cell 
       cl(i)%num = temp_num(i)
       
       !--- while sorting here, check to see if reallocation is needed
       if(cl(i)%num.gt.cl(i)%size) then
          
          new_size = int(ceiling(1.4*cl(i)%num))
          

          deallocate(cl(i)%loc_r)
          allocate(cl(i)%loc_r(new_size))
          cl(i)%size = new_size
                   
       endif
    enddo

   

    do i= 1,param%np
       !--- get cell for particle
       icell = atom(i)%cell
        
       !--- place particles in cell
       cl(icell)%loc_r(atom(i)%loc)%x = x(i)   
       cl(icell)%loc_r(atom(i)%loc)%y = y(i)     
       cl(icell)%loc_r(atom(i)%loc)%z = z(i)     
  
    enddo


  end subroutine total_cell_sort_init

  subroutine build_r()
    integer :: i,j,counter

    counter = 0 

    do i = 0,listvar%ncellT-1  
 
       start(i) = counter + 1

       do j = 1 ,cl(i)%num
          
          counter = counter + 1
          
          x(counter)    = cl(i)%loc_r(j)%x
          y(counter)    = cl(i)%loc_r(j)%y
          z(counter)    = cl(i)%loc_r(j)%z

       enddo
       end(i) = counter
    enddo
  end subroutine build_r


     
end module modlist

module global
  implicit none

 

  !-----------------------------------------------------------------------------------!
  !                      INPUT VARIABLES                                              !
  !-----------------------------------------------------------------------------------!
  type input
     double precision :: rcut,rcut2,rv,rv2,rcrv2,rcrv
     double precision :: maxd,vmax
     double precision :: kboltz
     double precision :: vol,box,hbox,ibox
     double precision :: dens
     double precision :: temp,P,beta
     double precision :: alpha
     integer :: np
     integer :: istart
     integer :: mceq,mcsteps
     integer :: nsample, nadjust
  end type input
  
  type nuc
     double precision :: kn
     integer :: numSweep,numStep
  end type nuc
  type(nuc) :: nucVar


  type ainfo
     double precision :: x,y,z
     integer :: type
  end type ainfo


  !--- declare param variable
  !dir$ attributes offload:mic::param
  type(input) :: param

 

 

 


  !------------------------------------------------------------------------------------!
  !                        POSITION ARRAYS                                             !
  !------------------------------------------------------------------------------------!
  !dir$ attributes offload:mic::start
  integer,allocatable :: start(:)
  !dir$ attributes offload:mic::end
  integer,allocatable :: end(:)

  !dir$ attributes offload:mic::min_x
  double precision, allocatable :: min_x(:)
  !dir$ attributes offload:mic::min_y
  double precision, allocatable :: min_y(:)
  !dir$ attributes offload:mic::min_z
  double precision, allocatable :: min_z(:)
  
  !dir$ attributes offload:mic::x
  double precision, allocatable :: x(:)
  !dir$ attributes offload:mic::y
  double precision, allocatable :: y(:)
  !dir$ attributes offload:mic::Z
  double precision, allocatable :: z(:)

 

 

 

 

 

 

 

  !--------------------------------------------------------------------------------!
  !---------------------  LIST ARRAYS  --------------------------------------------!
  type atom_cell
     !--- cell of atom
     integer :: cell
     
     !--- position in cell list
     integer :: loc

  end type atom_cell


  type cell_data
     
     !--- x,y,z coordinates of particles
     type(ainfo),allocatable :: loc_r(:)

     !--- total number of atoms in cell
     integer :: num

     !---size of array
     integer :: size
  end type cell_data

 

  type vlist
    
    !--- minimum image displacement convention
     double precision :: min_z,min_y,min_x

     !--- cnum is going to be the cell index being held
     integer :: cnum
 
  end type vlist

 

  type vlist_SOA
     double precision, allocatable :: min_x(:),min_y(:),min_z(:)
     integer, allocatable :: cnum(:)
  end type vlist_SOA


  type list  
     !.....declare doubles and ints
     double precision :: rn
     integer :: ncellT
     integer :: ncellD

     !--- typically 1.4 * ncellT
     integer :: alloc_size
  
     integer :: cv_size
  end type list


  !--- try just arrays
  !dir$ attributes offload:mic::cnum
  integer, allocatable :: cnum(:)

 

  !--- declare variable for above types
  !dir$ attributes offload:mic:: listvar
  type(list) :: listvar
  type(atom_cell),allocatable  :: atom(:)
  
  !--- cell lists
  !--- for each cell, contains the atoms present
  !--- cell_list(i)%cell_members(j) is ith cell's jth member
  type(cell_data), allocatable :: cl(:)

  !--- cell verlet lists
  !--- This is going to be interaction matrix for cell pairs
  !--- Each index in matrix will contain verlet list
  !--- index (i,j)%(a,:) will contain atoms in cell j within cuttoff from atom a in cell i
  type(vlist), allocatable :: cv_total(:,:)

 

  !---temp variables used for cell sorts
  integer, allocatable :: temp_num(:)
  

 

end module global

module position

  use ifport
  use global

  implicit none
  integer:: numpd
  double precision ::gsize
  integer :: flag
 


contains

  subroutine init_position(situation,file_name)
    
    implicit none
    integer:: counter,i,j,k
    character(len=*) :: situation,file_name
    double precision :: half,deltax,remainder,spot

    print*,'from init position:',param%box,param%np
    numpd = ceiling(param%np**(1.0d0/3.0d0))
    gsize = param%box/(1.0d0*numpd)
    

    print*,'----------------initializing positions ----------------------'
       
       print*,'              INITIALIZING PARTICLES ON CUBIC GRID'
       print*,'              GRID SPACING:',gsize
       print*,'              NUMBER PER DIMENSION',numpd
       counter =1
       
       do i=1,numpd
          do j=1,numpd
             do k=1,numpd
                
                if(counter.le.param%np)then
                   
                   x(counter) = (k-1)*gsize
                   y(counter) = (j-1)*gsize
                   z(counter) = (i-1)*gsize
                   
                endif
                counter = counter+1
                
             enddo
          enddo
       enddo

         
  end subroutine init_position
  
end module position

 

 

0 Kudos
2 Replies
conor_p_
Beginner
374 Views

Ugh. Sorry I don't know why the code came out like that, and I don't see an edit post anywhere so I will repost the code

program lennard
  
  use ifport
  use position
  use global
  use modlist

  implicit none
  double precision :: potential, pressure
  integer :: criterium
  double precision :: energy,filter
  double precision :: minx,miny,minz
  double precision :: dx,dy,dz
  double precision :: dr,dr2,dri,dr2i,dr6i,dr12i
  double precision :: x1,y1,z1,x2,y2,z2
  integer :: l,i,j,k
  integer :: c1s,c1e,c2s,c2e
  integer :: T1,T2,T3,T4,clock_rate,clock_max
  integer :: count,cell_neigh
  integer :: num,tid,count1,count2,count3
  integer :: rep

  param%rcut = 2.50d0
  param%rcut2 = param%rcut**2
  param%dens = 1.0d0
  param%np   = 60000
  param%vol   = param%np/param%dens
  param%box   = param%vol**(1.0/3.0)
  param%hbox  = param%box/(2.0d0)

  allocate(x(param%np),y(param%np),z(param%np))



  call seed(10)
  !call srand(10)

  !--- initialize particles on a cubic grid
  call init_position('cubic grid', 'configuration.dat')
  
  !--- initialize list variables
  call init_list()
  call compute_cell_neighbors()
  call total_cell_sort_init(0)
  call build_r()
 

  
    
  
   do rep = 1,3
      print*,'beginning rep:',rep
      
      energy = 0.0d0
      
      
      count1 = 0

      !-----------------------------------------------------------------------------!
      ! case 1: MIC computation with data transfer                                  !
      !-----------------------------------------------------------------------------!
      call system_clock(T3,clock_rate,clock_max)

      !dir$ offload_transfer target(mic:0),&
      !dir$& in(x: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(y: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(z: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(min_x: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(min_y: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(min_z: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(start: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(end: alloc_if(.true.) free_if(.false.)),&
      !dir$& in(cnum: alloc_if(.true.) free_if(.false.))

      !dir$ offload begin target(mic:0) nocopy(x,y,z,min_x,min_y,min_z,start,end,cnum) in(listvar) inout(energy)
      
      !$omp parallel do schedule(dynamic) reduction(+:energy),&
      !$omp& default(private) shared(min_x,min_y,min_z,start,end,cnum,x,y,z,listvar)
      do i = 0,listvar%ncellT-1
         c1s = start(i); c1e = end(i)
         
         
         do j = i*13,13*i+12
            
            cell_neigh = cnum(j)
            
            c2s = start(cell_neigh); c2e = end(cell_neigh)
            
            minx = min_x(j); miny = min_y(j) ; minz = min_z(j)
            
            
            do k= c1s,c1e
               x1 = x(k); y1 = y(k); z1 = z(k)
               
               do l = c2s,c2e
                  x2 = x(l); y2 = y(l); z2 = z(l)
                  
                  dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz
                  
                  dr2 = dx*dx + dy*dy + dz*dz
                  
                  filter = 0.0d0
                  if(dr2.lt.2.50d0)then
                     filter = 1.0d0
                  endif
                  dr = sqrt(dr2)
                  dri = 1.0d0/dr
                  dr2i = 1.0d0*dri*dri
                  dr6i = dr2i*dr2i*dr2i
                  dr12i = dr6i*dr6i
                  
                  energy = energy + 4.0d0*(dr12i-dr6i)*filter
                  
         
               enddo
            enddo
         enddo
      enddo
      !$omp end parallel do
      !dir$ end offload
      call system_clock(T4,clock_rate,clock_max)
      print*,'timing for MIC computation case 2:',real(T4-T3)/real(clock_rate)      
      print*,'energy case 2:',energy
      print*
      print*
      
   end do
   !-----------------------------------------------------------!
   ! case three: arrays with data transfer and no memory alloc   !
   !-----------------------------------------------------------!
   
   
   do rep = 1,3
      
      energy = 0.0d0
   
      print*,'beginning rep:',rep
      call system_clock(T3,clock_rate,clock_max)



      !dir$ offload_transfer target(mic:0),&
      !dir$& in(x: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(y: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(z: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(min_x: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(min_y: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(min_z: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(start: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(end: alloc_if(.false.) free_if(.false.)),&
      !dir$& in(cnum: alloc_if(.false.) free_if(.false.))

      !dir$ offload begin target(mic:0) nocopy(x,y,z,min_x,min_y,min_z,start,end,cnum) in(listvar) inout(energy)

      !$omp parallel do schedule(dynamic) reduction(+:energy),&
      !$omp& default(private) shared(min_x,min_y,min_z,start,end,cnum,x,y,z,listvar)
      do i = 0,listvar%ncellT-1
         c1s = start(i); c1e = end(i)

         !do k = c1s,c1e
         !   x1 = x(k); y1 = y(k); z1 = z(k)
         
         do j = i*13,13*i+12
            
            cell_neigh = cnum(j)
            
            c2s = start(cell_neigh); c2e = end(cell_neigh)
            
            minx = min_x(j); miny = min_y(j) ; minz = min_z(j)
            
            
            do k= c1s,c1e
               x1 = x(k); y1 = y(k); z1 = z(k)
               
               do l = c2s,c2e
                  x2 = x(l); y2 = y(l); z2 = z(l)
                  
                  dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz
                  
                  dr2 = dx*dx + dy*dy + dz*dz
                  
                  filter = 0.0d0
                  if(dr2.lt.2.50d0)then
                     filter = 1.0d0
                  endif
                  dr = sqrt(dr2)
                  dri = 1.0d0/dr
                  dr2i = 1.0d0*dri*dri
                  dr6i = dr2i*dr2i*dr2i
                  dr12i = dr6i*dr6i
                  
                  energy = energy + 4.0d0*(dr12i-dr6i)*filter
                  
        
                  
               enddo
            enddo
         enddo
      enddo
      !$omp end parallel do
      !dir$ end offload
      
      call system_clock(T4,clock_rate,clock_max)
      print*,'time for MIC computation case 3 :',real(T4-T3)/real(clock_rate)      
      print*,'energy case 3:',energy
      print*
      print*
   enddo
   
   !-----------------------------------------------------------!
   ! case four: openmp                                         !
   !-----------------------------------------------------------!
   
   
   
   energy = 0.0d0
   
   call system_clock(T1,clock_rate,clock_max)
   
   !$omp parallel do schedule(dynamic) reduction(+:energy),&
   !$omp& default(private) shared(min_x,min_y,min_z,cnum,start,end,x,y,z,listvar)
   do i = 0,listvar%ncellT-1

     c1s = start(i); c1e = end(i)

     !do k = c1s,c1e
     !   x1 = x(k); y1 = y(k); z1 = z(k)
     do j = i*13,13*i+12
   
        cell_neigh = cnum(j)
     
        c2s = start(cell_neigh); c2e = end(cell_neigh)
        minx = min_x(j); miny = min_y(j) ; minz = min_z(j)
        
        do k= c1s,c1e
           x1 = x(k); y1 = y(k); z1 = z(k)
           
           do l = c2s,c2e
              x2 = x(l); y2 = y(l); z2 = z(l)
              
              dx = x2-x1-minx; dy = y2-y1-miny; dz = z2-z1-minz
              
              dr2 = dx*dx + dy*dy + dz*dz
                                                           
              filter = 0.0d0
              if(dr2.lt.2.0d0) filter =1.0d0
              dr = sqrt(dr2)
              dri = 1.0d0/dr
              dr2i = 1.0d0*dri*dri
              dr6i = dr2i*dr2i*dr2i
              dr12i = dr6i*dr6i
              
              energy = energy + 4.0d0*(dr12i-dr6i)*filter
              
        
           enddo
        enddo
     enddo
  enddo
  !$omp end parallel do
  call system_clock(T2,clock_rate,clock_max)

  print*,'time case 4 with filter:',real(T2-T1)/real(clock_rate)
  print*,'energy case 4:',energy
  print*
  print*

  stop 
end program lennard

module global
  implicit none



  !-----------------------------------------------------------------------------------!
  !                      INPUT VARIABLES                                              !
  !-----------------------------------------------------------------------------------!
  type input
     double precision :: rcut,rcut2,rv,rv2,rcrv2,rcrv
     double precision :: maxd,vmax
     double precision :: kboltz
     double precision :: vol,box,hbox,ibox
     double precision :: dens
     double precision :: temp,P,beta
     double precision :: alpha
     integer :: np
     integer :: istart
     integer :: mceq,mcsteps
     integer :: nsample, nadjust
  end type input
  
  type nuc
     double precision :: kn
     integer :: numSweep,numStep
  end type nuc
  type(nuc) :: nucVar


  type ainfo
     double precision :: x,y,z
     integer :: type
  end type ainfo


  !--- declare param variable
  !dir$ attributes offload:mic::param
  type(input) :: param








  !------------------------------------------------------------------------------------!
  !                        POSITION ARRAYS                                             !
  !------------------------------------------------------------------------------------!
  !dir$ attributes offload:mic::start
  integer,allocatable :: start(:)
  !dir$ attributes offload:mic::end
  integer,allocatable :: end(:)

  !dir$ attributes offload:mic::min_x
  double precision, allocatable :: min_x(:)
  !dir$ attributes offload:mic::min_y
  double precision, allocatable :: min_y(:)
  !dir$ attributes offload:mic::min_z
  double precision, allocatable :: min_z(:)
  
  !dir$ attributes offload:mic::x
  double precision, allocatable :: x(:)
  !dir$ attributes offload:mic::y
  double precision, allocatable :: y(:)
  !dir$ attributes offload:mic::Z
  double precision, allocatable :: z(:)


  !--------------------------------------------------------------------------------!
  !---------------------  LIST ARRAYS  --------------------------------------------!
  type atom_cell
     !--- cell of atom
     integer :: cell
     
     !--- position in cell list
     integer :: loc

  end type atom_cell


  type cell_data
     
     !--- x,y,z coordinates of particles
     type(ainfo),allocatable :: loc_r(:)

     !--- total number of atoms in cell
     integer :: num

     !---size of array
     integer :: size
  end type cell_data



  type vlist
    
    !--- minimum image displacement convention
     double precision :: min_z,min_y,min_x

     !--- cnum is going to be the cell index being held
     integer :: cnum
 
  end type vlist



  type vlist_SOA
     double precision, allocatable :: min_x(:),min_y(:),min_z(:)
     integer, allocatable :: cnum(:)
  end type vlist_SOA


  type list  
     !.....declare doubles and ints
     double precision :: rn
     integer :: ncellT
     integer :: ncellD

     !--- typically 1.4 * ncellT
     integer :: alloc_size
  
     integer :: cv_size
  end type list


  !--- try just arrays
  !dir$ attributes offload:mic::cnum
  integer, allocatable :: cnum(:)



  !--- declare variable for above types
  !dir$ attributes offload:mic:: listvar
  type(list) :: listvar
  type(atom_cell),allocatable  :: atom(:)
  
  !--- cell lists
  !--- for each cell, contains the atoms present
  !--- cell_list(i)%cell_members(j) is ith cell's jth member
  type(cell_data), allocatable :: cl(:)

  !--- cell verlet lists
  !--- This is going to be interaction matrix for cell pairs
  !--- Each index in matrix will contain verlet list
  !--- index (i,j)%(a,:) will contain atoms in cell j within cuttoff from atom a in cell i
  type(vlist), allocatable :: cv_total(:,:)



  !---temp variables used for cell sorts
  integer, allocatable :: temp_num(:)
  
end module global


module modlist

  use global

  implicit none

contains

  !........................................!
  !     constructor for class list         !
  !........................................!
  subroutine init_list()
    
    implicit none
    integer :: i
    integer :: aver
    
    print*,param%box,param%rcut
        
    listvar%rn = (param%box/int(param%box/param%rcut))
    listvar%ncellT = nint(param%vol/(listvar%rn**3))
    listvar%ncellD = nint(listvar%ncellT**(1.0d0/3.0d0))
    listvar%alloc_size = int(ceiling(listvar%ncellT*1.2))
    
    print*,'--------------- INTIALIZING LISTS-----------------------------------'
    print*,'           rn                :',listvar%rn
    print*,'           ncellT            :',listvar%ncellT
    print*,'           ncellD            :',listvar%ncellD
    print*,'           MAKE SURE NCELLD CUBED IS NCELLT'
    print*,'--------------- END INTIALIZING LISTS-------------------------------'
    

    !--- try these arrays
    ! attributes align:64::min_x,min_y,min_z,cnum
    allocate(min_x(0:13*listvar%alloc_size),min_y(0:13*listvar%alloc_size),min_z(0:13*listvar%alloc_size))
    allocate(cnum(0:13*listvar%alloc_size))





    allocate(cl(0:listvar%alloc_size))
    allocate(cv_total(13,0:listvar%alloc_size))
    allocate(start(0:listvar%alloc_size),end(0:listvar%alloc_size))


    !--- allocate temp variables used for cell sort
    allocate(temp_num(0:listvar%alloc_size))
   

    aver = cell_max()
    aver = int(ceiling(1.4*aver))

    print*,'AVERAGRE',aver
    do i =0, listvar%alloc_size

       allocate(cl(i)%loc_r(aver))
       cl(i)%size = aver
    enddo
    


    !--- allocate atom cell construct
    allocate(atom(param%np))


    
  end subroutine init_list

  


  !......................................................................!
  !     determines the neigh neighbors for total energy calculation      !
  !......................................................................!  
  subroutine  neighcell_total()
    
    implicit none
    integer :: offsets(3,13)
    integer :: in,i,count
    integer :: ic,itel,icz,icy,icx
    integer :: ix,iy,iz
    integer :: iccx,iccy,iccz
    integer :: x,y,z

    !--- build offsets array
    offsets(1,1)=1; offsets(2,1)=0;  offsets(3,1) = 0
    offsets(1,2)=1; offsets(2,2)=1;  offsets(3,2) = 0
    offsets(1,3)=1; offsets(2,3)=-1; offsets(3,3) = 0
    offsets(1,4)=0; offsets(2,4)=-1; offsets(3,4) = 0
    
    offsets(1,5)=1; offsets(2,5)=0;  offsets(3,5) = 1
    offsets(1,6)=1; offsets(2,6)=1;  offsets(3,6) = 1
    offsets(1,7)=1; offsets(2,7)=-1; offsets(3,7) = 1
    offsets(1,8)=0; offsets(2,8)=-1; offsets(3,8) = 1
    
    offsets(1,9)=1;  offsets(2,9)=0;  offsets(3,9) = -1
    offsets(1,10)=1; offsets(2,10)=1; offsets(3,10) = -1
    offsets(1,11)=1; offsets(2,11)=-1;offsets(3,11) = -1
    offsets(1,12)=0; offsets(2,12)=-1;offsets(3,12) = -1
    
    offsets(1,13) = 0; offsets(2,13)=0; offsets(3,13) = 1
    

    count = 0
    do ic = 0,listvar%ncellT-1
             
       
       !--- obtain x,y,z coordinates of cell of interest: ic
       !--- this is doing using integer arithmetic 
       icz = ic/(listvar%ncellD*listvar%ncellD)
       icy = (ic - icz*listvar%ncellD*listvar%ncellD)/(listvar%ncellD)
       icx = ic -icy*listvar%ncellD - icz*listvar%ncellD*listvar%ncellD
       
       
       
       do i = 1,13
          
          
          ix = offsets(1,i); iy = offsets(2,i); iz = offsets(3,i)
          
          iccz = icz+iz
          
          if(iccz.lt.0) then
             z = iccz + listvar%ncellD                              
          else if (iccz.ge.listvar%ncellD) then
             z = iccz - listvar%ncellD                              
          else
             z = iccz
          endif
          
          
          iccy = icy+iy
          
          if(iccy.lt.0) then
             y = iccy + listvar%ncellD                                            
          else if (iccy.ge.listvar%ncellD) then
             y = iccy - listvar%ncellD                                                                                 
          else
             y = iccy
          endif
          
          
          iccx = icx+ix
          if(iccx.lt.0) then
             x = iccx + listvar%ncellD             
          else if (iccx.ge.listvar%ncellD) then
             x = iccx - listvar%ncellD                      
          else
             x = iccx    
          endif
          
          in = (x+y*listvar%ncellD + z*listvar%ncellD*listvar%ncellD)   
          
          cnum(count) = in

         
          cv_total(i,ic)%cnum = in
          
          !--- store min im in cv array
          if(iccx.lt.0) then  
             min_x(count) = param%box                 
          else if (iccx.ge.listvar%ncellD) then
             min_x(count) = -param%box             
          else
             min_x(count) = 0.0d0        
          endif
          
          if(iccy.lt.0) then  
             min_y(count) = param%box                 
          else if (iccy.ge.listvar%ncellD) then
             min_y(count) = -param%box             
          else
             min_y(count) = 0.0d0        
          endif
          
          if(iccz.lt.0) then  
             min_z(count) = param%box                 
          else if (iccz.ge.listvar%ncellD) then
             min_z(count) = -param%box             
          else
             min_z(count) = 0.0d0        
          endif

          count = count + 1
       enddo
     
    enddo
    
  end subroutine neighcell_total
  

  !........................................!
  ! compute cell interaction indicies      !
  ! compute miniomage for each cell pair   !
  ! compute cell coordinates for each cell
  !........................................!
  subroutine compute_cell_neighbors()
    implicit none
       
    call neighcell_total()


  end subroutine compute_cell_neighbors
          


  !........................................!
  !     determines cell number             !
  !........................................!
  integer function cell(x,y,z)
    
    implicit none
    double precision :: x,y,z
    integer :: ix,iy,iz
    
    if(x.lt.param%box)then    
       ix = int(x/listvar%rn)
    else
       ix = listvar%ncellD-1
    endif

    if(y.lt.param%box)then    
       iy = int(y/listvar%rn)
    else
       iy = listvar%ncellD-1
    endif
    
    if(z.lt.param%box)then    
       iz = int(z/listvar%rn)
    else
       iz = listvar%ncellD-1
    endif

   
    cell = ix+iy*listvar%ncellD+iz*listvar%ncellD*listvar%ncellD
    
  end function cell
  
  
 


  !.....................................................................!
  ! finds cell with largest number of atoms for initial array allocation
  !.....................................................................!
  integer function cell_max()
    implicit none
    integer :: temp_cell(param%np)
    integer :: i,icell

    temp_num(:) = 0
  
    cell_max = 0
    do i = 1 ,param%np

       icell = cell( x(i),y(i),z(i))

       temp_cell(i) = icell
       temp_num(icell) = temp_num(icell) + 1     

       if(temp_num(icell).gt.cell_max)then
          cell_max = temp_num(icell)
       endif

    enddo

  end function cell_max



  !.....................................................................................!
  ! want to assign each atom to appropriate cell
  ! generate list of atoms in each cell for GONET algorithm
  ! do not want to deallocate and reallocate her
  !......................................................................................!
  subroutine total_cell_sort_init(step)
    implicit none
    integer :: step
    integer :: i,icell
    integer :: new_size

    temp_num(:) = 0
      
    do i = 1 ,param%np

       icell = cell(x(i),y(i),z(i))

       temp_num(icell) = temp_num(icell) + 1     

       !--- keep particles current cell
       atom(i)%cell = icell 
       atom(i)%loc  = temp_num(icell)       
    enddo



    do i=0, listvar%ncellT - 1

       !--- number of molecules in cell 
       cl(i)%num = temp_num(i)
       
       !--- while sorting here, check to see if reallocation is needed
       if(cl(i)%num.gt.cl(i)%size) then
          
          new_size = int(ceiling(1.4*cl(i)%num))
          

          deallocate(cl(i)%loc_r)
          allocate(cl(i)%loc_r(new_size))
          cl(i)%size = new_size
                   
       endif
    enddo

   

    do i= 1,param%np
       !--- get cell for particle
       icell = atom(i)%cell
        
       !--- place particles in cell
       cl(icell)%loc_r(atom(i)%loc)%x = x(i)   
       cl(icell)%loc_r(atom(i)%loc)%y = y(i)     
       cl(icell)%loc_r(atom(i)%loc)%z = z(i)     
  
    enddo


  end subroutine total_cell_sort_init

  subroutine build_r()
    integer :: i,j,counter

    counter = 0 

    do i = 0,listvar%ncellT-1  
 
       start(i) = counter + 1

       do j = 1 ,cl(i)%num
          
          counter = counter + 1
          
          x(counter)    = cl(i)%loc_r(j)%x
          y(counter)    = cl(i)%loc_r(j)%y
          z(counter)    = cl(i)%loc_r(j)%z

       enddo
       end(i) = counter
    enddo
  end subroutine build_r


     
end module modlist
​
module position

  use ifport
  use global

  implicit none
  integer:: numpd
  double precision ::gsize
  integer :: flag
 


contains

  subroutine init_position(situation,file_name)
    
    implicit none
    integer:: counter,i,j,k
    character(len=*) :: situation,file_name
    double precision :: half,deltax,remainder,spot

    print*,'from init position:',param%box,param%np
    numpd = ceiling(param%np**(1.0d0/3.0d0))
    gsize = param%box/(1.0d0*numpd)
    

    print*,'----------------initializing positions ----------------------'
       
       print*,'              INITIALIZING PARTICLES ON CUBIC GRID'
       print*,'              GRID SPACING:',gsize
       print*,'              NUMBER PER DIMENSION',numpd
       counter =1
       
       do i=1,numpd
          do j=1,numpd
             do k=1,numpd
                
                if(counter.le.param%np)then
                   
                   x(counter) = (k-1)*gsize
                   y(counter) = (j-1)*gsize
                   z(counter) = (i-1)*gsize
                   
                endif
                counter = counter+1
                
             enddo
          enddo
       enddo

         
  end subroutine init_position
  
end module position


 

0 Kudos
Sunny_G_Intel
Employee
375 Views

Hi Conor,

Before investigating further I would like to verify if I have the source code compiled and running correctly. I have the following output :

 from init position:   39.1486806904843            60000
 ----------------initializing positions----------------------
               INITIALIZING PARTICLES ON CUBIC GRID
               GRID SPACING:  0.978717017262106     
               NUMBER PER DIMENSION          40
   39.1486806904843        2.50000000000000     
 --------------- INTIALIZING LISTS-----------------------------------
            rn                :   2.60991204603228     
            ncellT            :        3375
            ncellD            :          15
            MAKE SURE NCELLD CUBED IS NCELLT
 --------------- END INTIALIZING LISTS-------------------------------
 AVERAGRE          38
 beginning rep:           1
 timing for MIC computation case 2:  0.8054000    
 energy case 2:  0.000000000000000E+000
 
 
 beginning rep:           2
 timing for MIC computation case 2:  2.0999999E-03
 energy case 2:  0.000000000000000E+000
 
 
 beginning rep:           3
 timing for MIC computation case 2:  1.8000000E-03
 energy case 2:  0.000000000000000E+000
 
 
 beginning rep:           1
 time for MIC computation case 3 :  2.0999999E-03
 energy case 3:  0.000000000000000E+000
 
 
 beginning rep:           2
 time for MIC computation case 3 :  1.8000000E-03
 energy case 3:  0.000000000000000E+000
 
 
 beginning rep:           3
 time for MIC computation case 3 :  2.0000001E-03
 energy case 3:  0.000000000000000E+000
 
 
 time case 4 with filter:  6.9100000E-02
 energy case 4:  0.000000000000000E+000

 

I am not sure if the values for energy should be zero for all cases. Am I doing something wrong. 

0 Kudos
Reply