asynchronous data transfer format

conor_p_ · ‎07-31-2014

Hey again everyone,

I am attempting to do an asynchronous data transfer. I want to start data transfer, simultanesouly run stuff on the host, once host computation and data transfer is over run another portion of the code on the MIC. It look something like this

integer :: start(1000)

!dir$ offload_transfer target(mic:0),&
!dir$& in(start:alloc_if(.true.) free_if(.false.))
!dir$& signal(signal1)

!---do some miscellaneous code on host

!--- now that host computation is over, start MIC computation if data transfer if over

!dir$ offload_wait target(mic:0) nocopy(start) wait(signal1)

     !--- perform some code
!dir$ end offload

Unfortunately when I perform the above in my code, I keep getting: "found nocopy when executing one of : target if wait status mandatory optional, <end of statement>

An offload begin directive blocks starts in one F95 block and ends in another ( this is in reference to the end offload line)

Could someone please help me with my format here? I have searched around online, but haven't found the appropriate end offload_wait line.

Ravi_N_Intel · ‎07-31-2014

Remove the nocopy(start) from the offload_wait pragma.

conor_p_ · ‎07-31-2014

Thats what I am confused about. I know by default that all necessary data gets moved to MIC where space is allocated and deallocated. Does it not do this in the case of offload_wait? Will start get moved again in the case nocopy is not there?

Kevin_D_Intel · ‎07-31-2014

The data movement clauses are not valid on OFFLOAD_WAIT and there is no implicit data movement either. The directive waits the CPU execution at that point for completion of the specified signal from a previous asynchronous activity (documentation is here).

conor_p_ · ‎07-31-2014

removing the nocopy clause doesn't get rid of the error. I'll attach a sample code reproducing the error. It still says ' an offload begin directive block starts in one F95 block and ends in another'

module global
  implicit none
  
  type r
     double precision, allocatable :: x(60000),y(60000),z(60000)
  end type r
  
  !dir$ attributes offload:mic:: rSOA
  type(r) :: rSOA

  double precision, allocatable :: x(:)
  
  double precision, allocatable :: y(:)

  double precision, allocatable :: z(:)

 
end module global

program MIC
  use global 
  use ifport

  implicit none
  double precision :: energy
  double precision :: dx,dy,dz
  double precision :: x1,y1,z1,x2,y2,z2
  integer :: i,j,k,l
  integer :: np
  integer :: count,signal1
  call seed(10)

  np = 60000

  allocate(x(np),y(np),z(np))

  do i =1,np
     x(i) = rand()*1000; y(i) = rand()*1000; z(i) = rand()*1000
     rSOA%x(i) = x(i); rSOA%y(i) = y(i) ; rSOA%z(i) = z(i)
  end do

  count = 0
  do i = 1,1000
     start(i) = count*60+1
     end(i) = (count+1)*60

     count = count + 1
  enddo

  energy = 0.0d0
  
  !dir$ offload_transfer target(mic:0),&
  !dir$& in(start: alloc_if(.true.) free_if(.false.))
  !dir$& in(end: alloc_if(.true.) free_if(.false.))
  !dir$& in(rSOA),&
  !dir$& wait(signal1)

  !dir$ offload_wait target(mic:0) signal(signal1)

  do i = 1,1000
     c1s = start(i); end1 = end(i)
     do j = i+1, 1000
        c2s = start(j); c2e = end(j)

        do k= c1s,c1e
           x1 = rSOA%x(k); y1 = rSOA%y(k); z1 = rSOA%z(k)

           do l = c2s,c2e
              x2 = rSOA%x(l); y2 = rSOA%y(l); z2 = rSOA%z(l)

              dr2 = dx*dx + dy*dy + dz*dz
              
              energy = energy + dr2
           enddo
        enddo
     enddo
  enddo

  !dir$ end offload

end program MIC

Ravi_N_Intel · ‎07-31-2014

!dir$ offload_wait does no data transfer just waits for the signal specified in the clause

By default not all data are transferred.

$dir$ offload_transfer only acts data that are specified in the clauses.
$dir$ offload target acts on data that are specified in the clauses and those that are not specified in the clauses but used in the lexical scope of the offload region.

I used the word "acts" because the action could be nocopy, in, out, in/out, alloc, free.

Ravi_N_Intel · ‎07-31-2014

- You are using wait(signal1) at line 37 and 39. Where is the signal set. I assume you meant to use signal(signal1) at line 37
- If above is true then why are you doing a transfer and immediately waiting for it unless you plan to add some code inbetween
- I don't see any offload target, so why do you have end offload at line 60

conor_p_ · ‎08-02-2014

Ok, I think I was misunderstanding the offload_wait directive. It doesn't actually perform an offload directive, just tells everything to wait until the signal is received. I have changed to an !dir$offload begin target(mic:0) wait(signal1) clause. However, I am still getting a job failed error. I have added some nonsense code between the offload_transfer and offload begin clause, just for clarity (in my actual code, there is a subroutine thats not pertinent to this question called thats executed on the host)

module global
  implicit none
  
  type r
     double precision, allocatable :: x(60000),y(60000),z(60000)
  end type r
  
  !dir$ attributes offload:mic:: rSOA
  type(r) :: rSOA

  double precision, allocatable :: x(:)
  
  double precision, allocatable :: y(:)

  double precision, allocatable :: z(:)

 
end module global
program MIC
  use global 
  use ifport

  implicit none
  double precision :: energy
  double precision :: dx,dy,dz
  double precision :: x1,y1,z1,x2,y2,z2
  integer :: i,j,k,l
  integer :: np
  integer :: count,signal1
  call seed(10)

  np = 60000

  allocate(x(np),y(np),z(np))

  do i =1,np
     x(i) = rand()*1000; y(i) = rand()*1000; z(i) = rand()*1000
     rSOA%x(i) = x(i); rSOA%y(i) = y(i) ; rSOA%z(i) = z(i)
  end do

  count = 0
  do i = 1,1000
     start(i) = count*60+1
     end(i) = (count+1)*60

     count = count + 1
  enddo

  energy = 0.0d0
  
  !dir$ offload_transfer target(mic:0),&
  !dir$& in(start: alloc_if(.true.) free_if(.false.))
  !dir$& in(end: alloc_if(.true.) free_if(.false.))
  !dir$& in(rSOA),&
  !dir$& signal(signal1)

  j=0
  do i = 1,1000
     j = j+1
  enndo

  !dir$ offload begin target(mic:0) wait(signal1)

  do i = 1,1000
     c1s = start(i); end1 = end(i)
     do j = i+1, 1000
        c2s = start(j); c2e = end(j)

        do k= c1s,c1e
           x1 = rSOA%x(k); y1 = rSOA%y(k); z1 = rSOA%z(k)

           do l = c2s,c2e
              x2 = rSOA%x(l); y2 = rSOA%y(l); z2 = rSOA%z(l)

              dr2 = dx*dx + dy*dy + dz*dz
              
              energy = energy + dr2
           enddo
        enddo
     enddo
  enddo

  !dir$ end offload

end program MIC

The error I get is "device 0 does not have a pending signal for wait((nil))

Kevin_D_Intel · ‎08-04-2014

The signal tag must be initialized to a non-zero unique (from other signal variable's value where more than one signal is used) value. Add a non-zero initialization of signal1 before (line 51) the use in line 55.

jimdempseyatthecove · ‎06-28-2015

Kevin,

The earlier documentation seem to indicate address of signal variable as opposed to value of signal variable as stated in the newer documents. Can you ask someone on the offload compiler team confirm what is required by looking at the code (as opposed to reading the documents).

If, for example, it were the value passed in, then there would be no need for the value to be the sizeof(void*) because the function prototype in C/C++ can specify what size is used (and the argument will be promoted/truncated), and the INTERFACE in Fortran can specify pass by value and affect the promotion/truncation. The requirement of the (tag) being size of pointer is peculiar if the address is not used.

What would make sense is:

If the argument to signal(tag) is required to be the size of the host runtime pointer
.AND. if the offload wrote an opaque handle into the provided variable on signal(tag) (for use later) by wait(tag).

If this is the case, then why not document it as such. And note that in this case, the value need not be initialized because the variable would (should) receive an opaque handle for the offload as generated by and used by the offload system.
.AND. this would require that each pending signal tag would require different storage locations (to hold the pending handles).

Please ask the compiler team to look at the code, and specify what is used.

Jim Dempsey

Frances_R_Intel · ‎06-29-2015

I'm not part of the compiler team but trust me on this.

The value of the signal variable is what must be unique. The value is the key used to looked up the signal in the table. There are occasions where you will want to start a transfer in one scoping context and check for completion in another scoping context. Think ping/pong buffers. If you are in a different scoping context, the location of the signal variable might not be the same. Could they have required the signal variable to declared so that it has a fixed address? Sure. Then the address of the variable could have been used. But that is not what they did.

In C/C++, a pointer to one of the arrays you are transferring is the best choice for the value of the variable, since what you are typically waiting for is one or more arrays to finish transferring and the address of the array will be the same regardless of the scope of your signal variable. Hence the choice of (void *) as the type for the signal variable. Since in C/C++, the name of an array can be treated as a pointer to the array, what you usually end up with is signal(array_name).

In Fortran, pointers aren't really pointers, not in the C/C++ sense. Instead, they are a way to assign a particular name to a location in memory on the fly. And the name of an array refers to the complete array, not the location of the array as in C/C++. Could they have had signal(array_name) take the address of the array? Sure, but this would have copied the whole array into signal before it could take the address. Not a good idea. So what they did was say that in Fortran, the value passed to signal would be an integer and put no restrictions on what value people used. However, the best choice is to use the integer returned by the Fortran intrinsic LOC( ). As an intrinsic, there is no array copying going on and you end up with an integer value which is, effectively, an address for the array and unique. So, what you usually want is signal(LOC(array_name)).

Does that explain the method to the madness?