OpenMP GPU Offloading Map with nested user-defined type variables

Inspur · ‎02-06-2024

Dear all,

We are trying to rewrite our code with the ifx compiler in OpenMP for GPU Offloading. However, some errors occur with mapping a deep copy of the nested user-defined structure variables.

The nested structures contain both scalars and pointers. The most complicated problem is that the structure line contains a pointer to the user-defined structure array node(:).

   type point
      real :: x, y
      real, pointer :: p_data(:)
   end type point

   type line
      real :: length
      type(point), pointer :: node(:)
      real, pointer :: l_data(:)
   end type line

We heard that there are many restrictions with the pointer components (2.19.7 Data-Mapping Attribute Rules, Clauses, and Directives) and the mapper could help to provide the deep copy functionality for mapping user-define variables. So we tried to use the mapper directive to help define the scalar components and use the map to define the pointer components manually.

module struct_data

   type point
      real :: x, y
      real, pointer :: p_data(:)
   end type point

   !$omp declare mapper ( point_mapper: point :: var ) map ( tofrom: var%x, var%y )

   type line
      real :: length
      type(point), pointer :: node(:)
      real, pointer :: l_data(:)
   end type line

   !$omp declare mapper ( line_mapper: line :: var ) map ( tofrom: var%length )

contains

   subroutine test_mapper(NP, NValue)
      implicit none

      integer :: NP, NValue

      type(point), allocatable, target :: p(:)
      type(line)  :: lnn

       ! local variable
      integer :: i, j

      ! allocate data
      allocate (p(NP), lnn%l_data(NValue))
      lnn%node => p

      do i = 1, NP
         allocate ( lnn%node(i)%p_data(NValue) )
         !$omp target enter data map( lnn%node(i)%p_data )
         !$omp target enter data map( mapper(point_mapper), alloc: lnn%node(i))
      end do

      !$omp target enter data map( mapper(line_mapper), alloc: lnn, lnn%l_data )

      ! assign value
      !$omp target
      lnn%node(i)%p_data = 0
      do i = 1, NP
         do j = 1, NValue
            lnn%node(i)%p_data(j) = j + i - 1
         end do
      end do
      !$omp end target

      !$omp target
      do j = 1, NValue
         lnn%l_data(j) = NValue + j
      end do
      !$omp end target

      write (*, *)
      write (*, *) "Check Result "
      do i = 1, NP
         write (*, *) " lnn%node(", i, ")%p_data = ", lnn%node(i)%p_data
      end do
      write (*, *) "Device Result"
      do i = 1, NP
         !$omp target update from(lnn%node(i)%p_data)
      end do
      do i = 1, NP
         write (*, *) " lnn%node(", i, ")%p_data = ", lnn%node(i)%p_data
      end do

      !$omp target exit data map( mapper(line_mapper), delete: lnn, lnn%l_data )
      deallocate (lnn%l_data)
      do i = 1, NP
         deallocate (p(i)%p_data)
         !$omp target exit data map( delete: lnn%node(i)%p_data )
         !$omp target exit data map( mapper(point_mapper), delete: lnn%node(i) )
      end do

   end subroutine test_mapper

end module struct_data

In our test, the p_data array from lnn%node(I) should be (i, i+1, i+2, ...). However, the output shows only part of the arrays in the lnn%node(:) has changed.

The main program in the test case is

program main
   use struct_data
   implicit none

   call test_mapper(3, 5)
   call test_mapper(8, 3)
   call test_mapper(5, 3)

end program main

and the output is

mpif90 -fc=ifx -fiopenmp -fopenmp-targets=spir64 -lsycl -lOpenCL -free        -c test.F
mpif90 -fc=ifx -fiopenmp -fopenmp-targets=spir64 -lsycl -lOpenCL -free  -o test.exe test.o  
./test.exe
 
 Check Result 
  lnn%node(           1 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  0.0000000E+00  0.0000000E+00
  lnn%node(           2 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  0.0000000E+00  0.0000000E+00
  lnn%node(           3 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  0.0000000E+00  0.0000000E+00
 Device Result
  lnn%node(           1 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  0.0000000E+00  0.0000000E+00
  lnn%node(           2 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  0.0000000E+00  0.0000000E+00
  lnn%node(           3 )%p_data =    3.000000       4.000000       5.000000    
   6.000000       7.000000    
 
 Check Result 
  lnn%node(           1 )%p_data =  -6.8166343E+19  4.5833670E-41  0.0000000E+00
  lnn%node(           2 )%p_data =  -6.8166202E+19  4.5833670E-41  0.0000000E+00
  lnn%node(           3 )%p_data =    3.000000       4.000000       5.000000    
  lnn%node(           4 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           5 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           6 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           7 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           8 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
 Device Result
  lnn%node(           1 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           2 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           3 )%p_data =    3.000000       4.000000       5.000000    
  lnn%node(           4 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           5 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           6 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           7 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           8 )%p_data =    8.000000       9.000000       10.00000    
 
 Check Result 
  lnn%node(           1 )%p_data =  -6.8166343E+19  4.5833670E-41  0.0000000E+00
  lnn%node(           2 )%p_data =  -6.8166202E+19  4.5833670E-41  0.0000000E+00
  lnn%node(           3 )%p_data =  -6.8166062E+19  4.5833670E-41   5.000000    
  lnn%node(           4 )%p_data =  -6.8165921E+19  4.5833670E-41  0.0000000E+00
  lnn%node(           5 )%p_data =  -6.8165780E+19  4.5833670E-41  0.0000000E+00
 Device Result
  lnn%node(           1 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           2 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           3 )%p_data =    3.000000       4.000000       5.000000    
  lnn%node(           4 )%p_data =    4.000000       5.000000       6.000000    
  lnn%node(           5 )%p_data =    5.000000       6.000000       7.000000

The platform is the HD Graphics with Intel(R) Core(TM) i7-10510U CPU. The compiler is the latest ifx compiler.

$ clinfo -l
Platform #0: Intel(R) FPGA Emulation Platform for OpenCL(TM)
 `-- Device #0: Intel(R) FPGA Emulation Device
Platform #1: Intel(R) OpenCL
 `-- Device #0: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
Platform #2: Intel(R) OpenCL
 `-- Device #0: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
Platform #3: Intel(R) OpenCL HD Graphics
 `-- Device #0: Intel(R) Graphics [0x9b41]

$ ifx --version
ifx (IFX) 2024.0.2 20231213
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

Barbara_P_Intel · ‎04-15-2024

My first suggestion is to compile with ifx; I don't see any MPI library calls that require that.

Then simplify the compiler options. Try "-fiopenmp -fopenmp-targets=spir64 -free".

Try the latest version of ifx, 2024.1.0. It was release a few weeks ago.