Software Archive
Read-only legacy content
17061 Discussions

offload mode: allocating mic arrays without host allocation

conor_p_
Beginner
549 Views

Hi, 

I am porting subroutines to a xeon phi using offload mode. Now, I know how to allocate/deallocate arrays on the mic when they are also declared on the host. Something like below for allocation and similarly for deallocation. However, I have an array that is only ever used on the coprocessor. It is also fairly large. So if I don't have to allocate it on the host, I would prefer not to. However, I can't seem to figure out how to allocate it on the coprocessor without it being on the host. The program test below shows what I am trying to do. It produces the error "memory allocation of zero or negative length is not supported." Can anyone help me out here? 

!---example of simple coprocessor allocation when array is already declared on host
allocate(array(size))

..now mic
!dir$ offload_transfer in(array: alloc_if(.true.) free_if(.false.))
  
!---try to allocate on coprocessor without allocating on host.
program test
  implicit none
  !dir$ attributes offload:mic :: asize
  integer :: asize
  !dir$ attributes offload:mic :: array
  integer,allocatable :: array(:)

  asize = 10
  !dir$ offload begin target(mic:0) in(asize)
  print*,'hello from the MIC',asize
  !dir$ end offload


  !dir$ offload_transfer target(mic:0),&
  !dir$ in(asize),&
  !dir$ in(array:length(asize) alloc_if(.true.) free_if(.false.))

  print*,'we made it'

end program test

I have tried both a nocopy and in statement for the array. Finally once I have allocated the array on the coprocessor without it being on the host, what is the appropriate statement in the offload pragma? Specifically, would I just stick to

!dir$ offload target(mic:0) begin nocopy(array: alloc_if(.false.)  free_if(.false.))

 

0 Kudos
11 Replies
Kevin_D_Intel
Employee
549 Views

When the allocatable array is not needed/used on the host but used exclusively on the coprocessor then one can manage the allocation/deallocation directly within the offload region using ALLOCATE / DEALLOCATE as shown a modified version of your example.

!---try to allocate on coprocessor without allocating on host.
program test
  implicit none
  !dir$ attributes offload:mic :: asize
  integer :: asize
  !dir$ attributes offload:mic :: array
  integer,allocatable :: array(:)

  asize = 10
  !dir$ offload begin target(mic:0) in(asize)
  print*,'hello from the MIC',asize
  !dir$ end offload


! !dir$ offload_transfer target(mic:0),&
! !dir$ in(asize),&
! !dir$ in(array:length(asize) alloc_if(.true.) free_if(.false.))

  !dir$ offload begin target(mic:0),&
  !dir$ in(asize),&
  !dir$ in(array:length(0) alloc_if(.true.) free_if(.false.))
     allocate(array(asize))
     print*,'hello from the MIC',allocated(array), size(array)
  !dir$ end offload

  print*,'we made it'

end program test
$  ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.2.164 Build 20150121
Copyright (C) 1985-2015 Intel Corporation.  All rights reserved.

$  ifort u542340.f90
$  ./a.out
 hello from the MIC          10
 we made it
 hello from the MIC T          10

You can leverage the compiler-managed allocation/deallocation associated with the OFFLOAD directive (and alloc_if/free_if modifiers) using a pointer method which permits transferring data between the host and coprocessor outlined by another user here.

0 Kudos
jimdempseyatthecove
Honored Contributor III
549 Views

I think something like this is in line with the jist of Connor's request

! *** untested code ***  
!---try to allocate on coprocessor without allocating on host.
program test
  implicit none
  !dir$ attributes offload:mic :: asize
  integer :: asize
  !dir$ attributes offload:mic :: array
  integer,allocatable :: array(:)

  integer :: answer, arg

  arg = 1
  asize = 10
  !dir$ offload begin target(mic:0) in(asize, arg) out(answer)
  print*,'hello from the MIC',asize
  allocate(array(asize))
  array = arg ! use array locally
  answer = sum(array)
  !dir$ end offload
  print*,"result=",result

  !... other work on host

  arg = 2
  !dir$ offload begin target(mic:0) in(arg) out(answer)
  array = array + arg
  result = sum(answer)
  !dir$ end offload
  print*,"result=",result

  !dir$ offload begin target(mic:0) out(asize)
  deallocate(array)
  asize = 0
  !dir$ end offload

  print*,'we made it'

end program test

Jim Dempsey

0 Kudos
conor_p_
Beginner
549 Views

Thanks everyone for your reply. Kevin I ran your code, and received this error

offload error: memory allocation of zero or negative length is not supported

I ran jim's code as well, and I got the same error. Could this be a MIC version specific error? I am running on a cluster called CONTE at Purdue University for anyone interested.

Just in case something got mixed up, I will repost kevins version of the code I ran below. Jim I have a question about the data transfer of the code you posted. It was my understanding that if you don't specify the type of memory transfer, the default behavior of the MIC was to transfer any variable used on MIC from host to MIC, and then deallocate. Is that whats happening in your version?

!---try to allocate on coprocessor without allocating on host.
program intel
  implicit none
  !dir$ attributes offload:mic :: asize
  integer :: asize
  !dir$ attributes offload:mic :: array
  integer,allocatable :: array(:)

  print*,'lets try kevins code'
  asize = 10
  !dir$ offload begin target(mic:0) in(asize)
  print*,'hello from the MIC',asize
  !dir$ end offload


! !dir$ offload_transfer target(mic:0),&
! !dir$ in(asize),&
! !dir$ in(array:length(asize) alloc_if(.true.) free_if(.false.))

  !dir$ offload begin target(mic:0),&
  !dir$ in(asize),&
  !dir$ in(array:length(0) alloc_if(.true.) free_if(.false.))
     PRINT*,'what is our size',asize
     allocate(array(asize))
     print*,'hello from the MIC',allocated(array), size(array)
  !dir$ end offload

  print*,'we made it'

end program intel
 

 

0 Kudos
Kevin_D_Intel
Employee
549 Views

That error occurs with the older 13.x compiler which is fairly old. You can check the version with: ifort -V

You might check with the system staff whether there is a newer compiler available, preferably 15.0.

0 Kudos
conor_p_
Beginner
549 Views

Kevin, you are right. I did ifort -V and got  

"version 13.1.1.163 build 20130313" on Conte

and version "13.1.0.146 build 20130121" on stampede. Its interesting that a supercomputer as big as stampede has such an outdated version. I will check with them.

0 Kudos
conor_p_
Beginner
549 Views

Kevin, do you have an answer about the data transfer in jims code? Jim doesn't specify the in clause, shown below, that you do. Its my understanding that the default behavior for the MIC is to transfer arrays over and allocate and deallocate as default behavior. Am I incorrect in my understanding?  What is the difference between your code where you specify the in clause, and jims where none is specified?

in(array:length(0) alloc_if(.false.) free_if(.false.))

 

 

 

0 Kudos
Frances_R_Intel
Employee
549 Views

Stampede uses environment modules to set up compilers, libraries and tools environments. They may well have later compilers that you can use. They have a rather complicated hierarchical setup to make sure all the modules loaded will play well with each other, so I won't even try to suggest how you should change your environment - their support people or site documentation is better for that. However, if you want to check quickly to see if they do have the later compilers, 'module avail' should tell you what is available and 'module list' should tell you what you have loaded right now. 

0 Kudos
jimdempseyatthecove
Honored Contributor III
549 Views

From my understanding, persistent data, declared via  "!dir$ attributes offload:mic YourPersistentData" is to be placed on variables in global scope (IOW in data declared in a module). Subroutine/function local and dummy data are not intended to be so attributed (someone correct me if I am wrong on that). If using a local or dummy array, use target and the additional clauses (optionally nocopy).

The clauses on the target (iow code follows) essentially encapsulates the listed arguments like a C++ lambda capture, or if you wish, like an OpenMP PRIVATE clause. All of which create hidden (possibly stack) copies of the specified data.

What I used earlier, I assume will not work unless you place array into a module.

My system with the Xeon Phi's is down (replacing fans), I cannot try out the code I post.

Thinking it terms of OpenMP, what you and I attempted to do would be similar to

!$OMP PARALLEL PRIVATE(array)
!diddle with array
!$OMP END PARALLEL
!...
!$OMP PARALLEL PRIVATE(array)
! unreliable to assume non-region-master copies of array not corrupted between parallel regions
!$OMP END PARALLEL

Jim Dempsey

0 Kudos
Kevin_D_Intel
Employee
549 Views

My apologies. It never fails; I always become confused when mixing pointers with offload….

Persistence of locals is available and supported but discouraged for large non-scalars.  There is some related discussion in the Effective Use of the Intel Compiler's Offload Features article under sections with the tag “Persistence:”.

What I was attempting to convey with IN(array : length(0)…) was only creating an instance of the pointer within the target scope but performing no actual data transfer; however, what I showed was incorrect for this particular example.

After consulting with development, I modified both Connor’s and Jim’s examples as shown below to fit Development’s recommendation which is for the target-only allocation use NOCOPY for the offloads (as discussed under Local Pointers Versus Pointers Used Across Offloads in the article cited earlier.) Connor, your code only has a single use of the allocatable on the target where Jim’s has multiple uses and thus illustrates persistent with the target-only allocation more completely.

Development also clarified my use of IN, Jim’s absence of any data clause for “array”, and the purpose for using NOCOPY. Here’s a summary that I hope helps further clarify:

Unless you use NOCOPY(array), the use of array within the offload region will make it default INOUT, so you are not achieving the MIC-only allocation for array that is desired. The compiler may examine the use of array within the function and optimize that from INOUT to just IN or just OUT.

When you specify IN(array), it remains IN. (Note: My use of this earlier in Connor’s example is wrong. If you cross-reference the table in the article noted earlier, IN with alloc_if(1)/free_if(0) and length(0) is an error. No error occurs w/Fortran due to the underlying mechanics specific to the target allocation. It would result in an error with C/C++.)

When you do not mention “array” in any clause, the compiler starts off with array as INOUT and then observing usage of array in the program, makes it OUT only.

The correct directive is shown below. Note that making a variable MIC-only prevents data transfer to/from that variable from the CPU.

  !dir$ offload begin target(mic:0) in(asize, arg) out(answer) nocopy(array)


I have shown both examples below with modifications per Developments clarification. I hope all this clears up the usage/interest and eliminates any confusion I might have caused.

Connor's Example
================

!---try to allocate on coprocessor without allocating on host.
program test
  implicit none
  integer :: asize
  integer,allocatable :: array(:)

  asize = 10
  !dir$ offload begin target(mic:0) in(asize)
     print*,'hello from the MIC',asize
  !dir$ end offload

  !dir$ offload begin target(mic:0),&
  !dir$ in(asize),&
  !dir$ nocopy(array)
     allocate(array(asize))
     print*,'hello from the MIC',allocated(array), size(array)
  !dir$ end offload

  print*,'we made it'

end program test

Jim's Example
==============

! *** untested code ***
!---try to allocate on coprocessor without allocating on host.
program test
  implicit none
  integer :: asize
  integer,allocatable :: array(:)

  integer :: answer, arg

  arg = 1
  asize = 10
  !dir$ offload begin target(mic:0) in(asize, arg) out(answer) nocopy(array)
     print*,'hello from the MIC',asize
     allocate(array(asize))
     array = arg ! use array locally
     answer = sum(array)
  !dir$ end offload
  print*,"result=",answer

  !... other work on host

  arg = 2
  !dir$ offload begin target(mic:0) in(arg) out(answer) nocopy(array)
     array = array + arg
     answer = sum(array)
  !dir$ end offload
  print*,"result=",answer

  !dir$ offload begin target(mic:0) out(asize) nocopy(array)
     print*,"allocated=",allocated(array)
     deallocate(array)
     print*,"allocated=",allocated(array)
     asize = 0
  !dir$ end offload

  print*,'we made it'

end program test

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
549 Views

Kevin,

Although the above modification of my code may have worked, it may have worked by accident rather than by design. There is an assumption being made that is not necessarily assured, and is not obvious from the example above.

CAUTION, check with developers on the following:

The assumption is that the array descriptor from the perspective of the MIC is immutable between offloads. When this program is compiled without OpenMP, and/or without -recursive, or without other options that force arrays (and array descriptors) to be located on stack, IOW the arrays are default SAVE (as in example above), then the array (and array descriptor) have persistence within each runtime environment (e.g. TEST_MP_ARRAY).

On the obverse, with -openmp, -recursive or with options that specify arrays are stack based, then there is a potential for the MIC array descriptor to get corrupted between offload regions (though the allocated memory in the MIC would not). From my (little) understanding of the operation of the offload begin targed... is that the offload section of code is effectively a remote procedure call, and also effectively like a parallel region with DEFAULT(PRIVATE). If the array descriptor is on stack (well on the respective stacks of the host and MIC), then there is the potential for the MIC copy to get corrupted between offloads. The fix for this would be to assure the array descriptor is SAVE or in a module..

This fix is NOT fine when the array descriptor needs to be private per thread on host (and were multiple threads are making offloads to the MIC). In this situation, I believe the option to use is to use the C interoperability (within Fortran) to, define "integer, pointer :: array()" then on the first offload that performs the allocation is on return to return the C_PTR of the base of the allocated array (and size if determined within the offload region). The returned value(s) is(are) to be preserved in the host code (thread context thereof), and then passed in on the next offload. The MIC side would then use the C interoperability of restore the array descriptor of the pointer. Note, the pointer itself would have the nocopy clause on all offloads. The reason being is a "pointer" to an array in Fortran does not point to the data, rather it points to an array descriptor. You do not want to copy the array descriptor (Fortran would copy what it points to), rather you want to reconstruct the array descriptor using the the preserved C_PTR and size.

I saw some examples of this on this forum some time ago.

=============

If you do this type of offloading a lot, then it might be beneficial to define a type (or types) that contain the necessary information to save and restore a pointer to an array. Then maintain and pass this back and forth between offloads.

Jim Dempsey

0 Kudos
Rajiv_D_Intel
Employee
549 Views

Even when the array descriptor is on the stack it is intended to work no differently than a statically allocated descriptor. The semantics of in/out/inout/nocopy will be honored. The implementation takes care of it, and if it doesn't behave as expected then that is a defect.

0 Kudos
Reply