- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We have been through that once, but here we go again, because latest results confuse me. My question is: in order to re-use a previously allocated memory buffer on the coprocessor, is the programmer required to supply a global pointer with attribute((target(mic))) in pragma offload?
The reason for this question is that I observe that global variables work in all cases, but local variables work in all cases except one (ouch!). So either it is a bug in the compiler or COI, or it a sign that one programming practice is better than another.
Below is a code that allocates multiple buffers and keeps them on Xeon Phi using "alloc_if(1) free_if(0)". Then it attempts to re-use these buffers using "alloc_if(0) free_if(0)" and supplying to the offload pragma a pointer to the same memory address that was used during allocation. The result is this:
1) If the pointer is a global variable with attribute((target(mic))), everything works: host addresses map to the correct coprocessor addresses
2) If the pointer is a local variable, and the direction of transfer is "out", the application works incorrectly: different host addresses map to the same coprocessor address, which is wrong
3) If the pointer is a local variable, and the direction of transfer is "in" or "inout" rather than "out", everything works again
4) Not shown in this code, but tested separately: if the pointer is a local variable, and the direction of transfer is "out", but I use pragma offload_transfer instead of pragma offload, everything works fine, too.
I would very much appreciate an explanation: is it a bug (i.e., everything should work whether the pointer is local or global), or am I required to use global variables in offload pragma with pre-allocated memory buffer?
Here is the code:
#include <cstdio> #include <omp.h> __attribute__((target(mic))) float* bufGlobal; int main() { const int N = 1<<10; const int m = 4; float* data = (float*)_mm_malloc(N*m*sizeof(float), 4096); float* buf; void* micBuf; printf("\nAllocating data on the coprocessor:\n"); for (int i = 0; i < m; i++) { buf = &data[i*N]; #pragma offload target(mic:0) inout(buf : length(N) alloc_if(1) free_if(0)) { micBuf = &buf[0]; } printf("Pointer on CPU=%p -> pointer on MIC=%p\n", buf, micBuf); fflush(0); } printf("\nRe-using data with the 'out' clause with local pointer (ERROR HERE):\n"); for (int i = 0; i < m; i++) { buf = &data[i*N]; #pragma offload target(mic:0) out(buf : length(N) alloc_if(0) free_if(0)) { micBuf = &buf[0]; } printf("Pointer on CPU=%p -> pointer on MIC=%p\n", buf, micBuf); fflush(0); } printf("\nRe-using data with the 'out' clause with global pointer (works fine):\n"); for (int i = 0; i < m; i++) { bufGlobal = &data[i*N]; #pragma offload target(mic:0) out(bufGlobal : length(N) alloc_if(0) free_if(0)) { micBuf = &bufGlobal[0]; } printf("Pointer on CPU=%p -> pointer on MIC=%p\n", bufGlobal, micBuf); fflush(0); } printf("\nRe-using data with the 'in' clause with local pointer (works fine):\n"); for (int i = 0; i < m; i++) { buf = &data[i*N]; #pragma offload target(mic:0) in(buf : length(N) alloc_if(0) free_if(0)) { micBuf = &buf[0]; } printf("Pointer on CPU=%p -> pointer on MIC=%p\n", buf, micBuf); fflush(0); } // Cleanup for (int i = 0; i < m; i++) { float* buf = &data[i*N]; #pragma offload_transfer target(mic:0) in(buf : length(N) alloc_if(0) free_if(1)) } _mm_free(data); }
And here is the output:
Allocating data on the coprocessor: Pointer on CPU=0xcf7000 -> pointer on MIC=0x7f200610e000 Pointer on CPU=0xcf8000 -> pointer on MIC=0x7f200610b000 Pointer on CPU=0xcf9000 -> pointer on MIC=0x7f2006108000 Pointer on CPU=0xcfa000 -> pointer on MIC=0x7f2006105000 Re-using data with the 'out' clause with local pointer (ERROR HERE): Pointer on CPU=0xcf7000 -> pointer on MIC=0x7f2006105000 Pointer on CPU=0xcf8000 -> pointer on MIC=0x7f2006105000 Pointer on CPU=0xcf9000 -> pointer on MIC=0x7f2006105000 Pointer on CPU=0xcfa000 -> pointer on MIC=0x7f2006105000 Re-using data with the 'out' clause with global pointer (works fine): Pointer on CPU=0xcf7000 -> pointer on MIC=0x7f200610e000 Pointer on CPU=0xcf8000 -> pointer on MIC=0x7f200610b000 Pointer on CPU=0xcf9000 -> pointer on MIC=0x7f2006108000 Pointer on CPU=0xcfa000 -> pointer on MIC=0x7f2006105000 Re-using data with the 'in' clause with local pointer (works fine): Pointer on CPU=0xcf7000 -> pointer on MIC=0x7f200610e000 Pointer on CPU=0xcf8000 -> pointer on MIC=0x7f200610b000 Pointer on CPU=0xcf9000 -> pointer on MIC=0x7f2006108000 Pointer on CPU=0xcfa000 -> pointer on MIC=0x7f2006105000
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is a bug in the 15.0 compiler. The code works as is with the previous 14.0 compiler. There is some underlying difference with the handling of the local pointer that affects the OUT case.
With 15.0, you can obtain correct results with a work around of adding an additional IN with length(0) for the failing case, as follows:
#pragma offload target(mic:0) in(buf : length(0) alloc_if(0) free_if(0)) \
out(buf : length(N) alloc_if(0) free_if(0))
I submitted this to Development (see internal tracking id below) and will post as I learn more.
(Internal tracking id: DPD200366742)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page