Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28994 Discussions

OpenMP GPU Offloading Map with nested user-defined type variables

Inspur
Beginner
1,023 Views

Dear all,

 

We are trying to rewrite our code with the ifx compiler in OpenMP for GPU Offloading. However, some errors occur with mapping a deep copy of the nested user-defined structure variables.

The nested structures contain both scalars and pointers. The most complicated problem is that the structure line contains a pointer to the user-defined structure array node(:).

   type point
      real :: x, y
      real, pointer :: p_data(:)
   end type point

   type line
      real :: length
      type(point), pointer :: node(:)
      real, pointer :: l_data(:)
   end type line

We heard that there are many restrictions with the pointer components (2.19.7 Data-Mapping Attribute Rules, Clauses, and Directives) and the mapper could help to provide the deep copy functionality for mapping user-define variables. So we tried to use the mapper directive to help define the scalar components and use the map to define the pointer components manually.

module struct_data

   type point
      real :: x, y
      real, pointer :: p_data(:)
   end type point

   !$omp declare mapper ( point_mapper: point :: var ) map ( tofrom: var%x, var%y )

   type line
      real :: length
      type(point), pointer :: node(:)
      real, pointer :: l_data(:)
   end type line

   !$omp declare mapper ( line_mapper: line :: var ) map ( tofrom: var%length )

contains

   subroutine test_mapper(NP, NValue)
      implicit none

      integer :: NP, NValue

      type(point), allocatable, target :: p(:)
      type(line)  :: lnn

       ! local variable
      integer :: i, j

      ! allocate data
      allocate (p(NP), lnn%l_data(NValue))
      lnn%node => p

      do i = 1, NP
         allocate ( lnn%node(i)%p_data(NValue) )
         !$omp target enter data map( lnn%node(i)%p_data )
         !$omp target enter data map( mapper(point_mapper), alloc: lnn%node(i))
      end do

      !$omp target enter data map( mapper(line_mapper), alloc: lnn, lnn%l_data )

      ! assign value
      !$omp target
      lnn%node(i)%p_data = 0
      do i = 1, NP
         do j = 1, NValue
            lnn%node(i)%p_data(j) = j + i - 1
         end do
      end do
      !$omp end target

      !$omp target
      do j = 1, NValue
         lnn%l_data(j) = NValue + j
      end do
      !$omp end target

      write (*, *)
      write (*, *) "Check Result "
      do i = 1, NP
         write (*, *) " lnn%node(", i, ")%p_data = ", lnn%node(i)%p_data
      end do
      write (*, *) "Device Result"
      do i = 1, NP
         !$omp target update from(lnn%node(i)%p_data)
      end do
      do i = 1, NP
         write (*, *) " lnn%node(", i, ")%p_data = ", lnn%node(i)%p_data
      end do

      !$omp target exit data map( mapper(line_mapper), delete: lnn, lnn%l_data )
      deallocate (lnn%l_data)
      do i = 1, NP
         deallocate (p(i)%p_data)
         !$omp target exit data map( delete: lnn%node(i)%p_data )
         !$omp target exit data map( mapper(point_mapper), delete: lnn%node(i) )
      end do

   end subroutine test_mapper

end module struct_data

 In our test, the p_data array from lnn%node(I) should be (i, i+1, i+2, ...). However, the output shows only part of the arrays in the lnn%node(:) has changed. 

 

The main program in the test case is

program main
   use struct_data
   implicit none

   call test_mapper(3, 5)
   call test_mapper(8, 3)
   call test_mapper(5, 3)

end program main

and the output is 

mpif90 -fc=ifx -fiopenmp -fopenmp-targets=spir64 -lsycl -lOpenCL -free        -c test.F
mpif90 -fc=ifx -fiopenmp -fopenmp-targets=spir64 -lsycl -lOpenCL -free  -o test.exe test.o  
./test.exe
 
 Check Result 
  lnn%node(           1 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  0.0000000E+00  0.0000000E+00
  lnn%node(           2 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  0.0000000E+00  0.0000000E+00
  lnn%node(           3 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  0.0000000E+00  0.0000000E+00
 Device Result
  lnn%node(           1 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  0.0000000E+00  0.0000000E+00
  lnn%node(           2 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  0.0000000E+00  0.0000000E+00
  lnn%node(           3 )%p_data =    3.000000       4.000000       5.000000    
   6.000000       7.000000    
 
 Check Result 
  lnn%node(           1 )%p_data =  -6.8166343E+19  4.5833670E-41  0.0000000E+00
  lnn%node(           2 )%p_data =  -6.8166202E+19  4.5833670E-41  0.0000000E+00
  lnn%node(           3 )%p_data =    3.000000       4.000000       5.000000    
  lnn%node(           4 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           5 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           6 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           7 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           8 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
 Device Result
  lnn%node(           1 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           2 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           3 )%p_data =    3.000000       4.000000       5.000000    
  lnn%node(           4 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           5 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           6 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           7 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           8 )%p_data =    8.000000       9.000000       10.00000    
 
 Check Result 
  lnn%node(           1 )%p_data =  -6.8166343E+19  4.5833670E-41  0.0000000E+00
  lnn%node(           2 )%p_data =  -6.8166202E+19  4.5833670E-41  0.0000000E+00
  lnn%node(           3 )%p_data =  -6.8166062E+19  4.5833670E-41   5.000000    
  lnn%node(           4 )%p_data =  -6.8165921E+19  4.5833670E-41  0.0000000E+00
  lnn%node(           5 )%p_data =  -6.8165780E+19  4.5833670E-41  0.0000000E+00
 Device Result
  lnn%node(           1 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           2 )%p_data =   0.0000000E+00  0.0000000E+00  0.0000000E+00
  lnn%node(           3 )%p_data =    3.000000       4.000000       5.000000    
  lnn%node(           4 )%p_data =    4.000000       5.000000       6.000000    
  lnn%node(           5 )%p_data =    5.000000       6.000000       7.000000

The platform is the HD Graphics with Intel(R) Core(TM) i7-10510U CPU. The compiler is the latest ifx compiler.

$ clinfo -l
Platform #0: Intel(R) FPGA Emulation Platform for OpenCL(TM)
 `-- Device #0: Intel(R) FPGA Emulation Device
Platform #1: Intel(R) OpenCL
 `-- Device #0: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
Platform #2: Intel(R) OpenCL
 `-- Device #0: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
Platform #3: Intel(R) OpenCL HD Graphics
 `-- Device #0: Intel(R) Graphics [0x9b41]

$ ifx --version
ifx (IFX) 2024.0.2 20231213
Copyright (C) 1985-2023 Intel Corporation. All rights reserved.

  

0 Kudos
1 Reply
Barbara_P_Intel
Employee
703 Views

My first suggestion is to compile with ifx; I don't see any MPI library calls that require that.

Then simplify the compiler options. Try "-fiopenmp -fopenmp-targets=spir64 -free".

Try the latest version of ifx, 2024.1.0. It was release a few weeks ago.

 

0 Kudos
Reply