Software Archive
Read-only legacy content
17061 Discussions

Offload with persistent MIC buffer: are global pointers required?

Andrey_Vladimirov
New Contributor III
343 Views

We have been through that once, but here we go again, because latest results confuse me. My question is: in order to re-use a previously allocated memory buffer on the coprocessor, is the programmer required to supply a global pointer with attribute((target(mic))) in pragma offload?

The reason for this question is that I observe that global variables work in all cases, but local variables work in all cases except one (ouch!). So either it is a bug in the compiler or COI, or it a sign that one programming practice is better than another.

Below is a code that allocates multiple buffers and keeps them on Xeon Phi using "alloc_if(1) free_if(0)". Then it attempts to re-use these buffers using "alloc_if(0) free_if(0)" and supplying to the offload pragma a pointer to the same memory address that was used during allocation. The result is this:

1) If the pointer is a global variable with attribute((target(mic))), everything works: host addresses map to the correct coprocessor addresses

2) If the pointer is a local variable, and the direction of transfer is "out", the application works incorrectly: different host addresses map to the same coprocessor address, which is wrong

3) If the pointer is a local variable, and the direction of transfer is "in" or "inout" rather than "out", everything works again

4) Not shown in this code, but tested separately: if the pointer is a local variable, and the direction of transfer is "out", but I use pragma offload_transfer instead of pragma offload, everything works fine, too.

I would very much appreciate an explanation: is it a bug (i.e., everything should work whether the pointer is local or global), or am I required to use global variables in offload pragma with pre-allocated memory buffer?

 

Here is the code:

#include <cstdio>
#include <omp.h>

__attribute__((target(mic)))  float* bufGlobal;

int main() {
  const int N = 1<<10;
  const int m = 4;
  float* data = (float*)_mm_malloc(N*m*sizeof(float), 4096);
  float* buf;
  void* micBuf;

  printf("\nAllocating data on the coprocessor:\n");
  for (int i = 0; i < m; i++) {
    buf = &data[i*N];
#pragma offload target(mic:0) inout(buf : length(N) alloc_if(1) free_if(0))
    {
      micBuf = &buf[0];
    }
    printf("Pointer on CPU=%p -> pointer on MIC=%p\n", buf, micBuf); fflush(0);
  }

  printf("\nRe-using data with the 'out' clause with local pointer (ERROR HERE):\n");
  for (int i = 0; i < m; i++) {
    buf = &data[i*N];
#pragma offload target(mic:0) out(buf : length(N) alloc_if(0) free_if(0)) 
     {
       micBuf = &buf[0];
     }
    printf("Pointer on CPU=%p -> pointer on MIC=%p\n", buf, micBuf); fflush(0);
  }

  printf("\nRe-using data with the 'out' clause with global pointer (works fine):\n");
  for (int i = 0; i < m; i++) {
    bufGlobal = &data[i*N];
#pragma offload target(mic:0) out(bufGlobal : length(N) alloc_if(0) free_if(0)) 
    {
      micBuf = &bufGlobal[0];
    }
    printf("Pointer on CPU=%p -> pointer on MIC=%p\n", bufGlobal, micBuf); fflush(0);
  }
  
  printf("\nRe-using data with the 'in' clause with local pointer (works fine):\n");
  for (int i = 0; i < m; i++) {
    buf = &data[i*N];
#pragma offload target(mic:0) in(buf : length(N) alloc_if(0) free_if(0)) 
    {
      micBuf = &buf[0];
    }
    printf("Pointer on CPU=%p -> pointer on MIC=%p\n", buf, micBuf); fflush(0);
  }

  // Cleanup
  for (int i = 0; i < m; i++) {
    float* buf = &data[i*N];
#pragma offload_transfer target(mic:0) in(buf : length(N) alloc_if(0) free_if(1))
  }

  _mm_free(data);
}

 

And here is the output:

Allocating data on the coprocessor:
Pointer on CPU=0xcf7000 -> pointer on MIC=0x7f200610e000
Pointer on CPU=0xcf8000 -> pointer on MIC=0x7f200610b000
Pointer on CPU=0xcf9000 -> pointer on MIC=0x7f2006108000
Pointer on CPU=0xcfa000 -> pointer on MIC=0x7f2006105000

Re-using data with the 'out' clause with local pointer (ERROR HERE):
Pointer on CPU=0xcf7000 -> pointer on MIC=0x7f2006105000
Pointer on CPU=0xcf8000 -> pointer on MIC=0x7f2006105000
Pointer on CPU=0xcf9000 -> pointer on MIC=0x7f2006105000
Pointer on CPU=0xcfa000 -> pointer on MIC=0x7f2006105000

Re-using data with the 'out' clause with global pointer (works fine):
Pointer on CPU=0xcf7000 -> pointer on MIC=0x7f200610e000
Pointer on CPU=0xcf8000 -> pointer on MIC=0x7f200610b000
Pointer on CPU=0xcf9000 -> pointer on MIC=0x7f2006108000
Pointer on CPU=0xcfa000 -> pointer on MIC=0x7f2006105000

Re-using data with the 'in' clause with local pointer (works fine):
Pointer on CPU=0xcf7000 -> pointer on MIC=0x7f200610e000
Pointer on CPU=0xcf8000 -> pointer on MIC=0x7f200610b000
Pointer on CPU=0xcf9000 -> pointer on MIC=0x7f2006108000
Pointer on CPU=0xcfa000 -> pointer on MIC=0x7f2006105000
0 Kudos
1 Reply
Kevin_D_Intel
Employee
343 Views

This is a bug in the 15.0 compiler. The code works as is with the previous 14.0 compiler. There is some underlying difference with the handling of the local pointer that affects the OUT case.

With 15.0, you can obtain correct results with a work around of adding an additional IN with length(0) for the failing case, as follows:

#pragma offload target(mic:0) in(buf : length(0)  alloc_if(0) free_if(0)) \
                              out(buf : length(N) alloc_if(0) free_if(0))

I submitted this to Development (see internal tracking id below) and will post as I learn more.

(Internal tracking id: DPD200366742)

0 Kudos
Reply