- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I would like to pre-allocate a number of buffers for later data transfers from CPU to MIC, using explicit offloading in C++.
It works nicely if each buffer corresponds to an explicit variable name, as e.g. in the double-buffering examples. However, I would like to have a configurable number of such buffers (more than 2), i.e. an array of buffers. (the buffers are used for asynchronous processing on the MIC, and I need quite a few of them).
I do have a workaround, i.e. allocate a single very big buffer, and cut it into pieces (by using offsets and 'into' for transfers), but as the buffers do not need to be to be contiguous, I'm afraid adding this constraint may cause problems to find a big block available at runtime. So I would prefer to have several smaller buffers if possible.
The code below will probably describe easily the issue. In the first part, it works fine with 2 variable names. But in the second part, with an array, I don't find how to proceed (or is it simply not possible?). I tried without success various syntaxes, but could not find one accepted by the compiler.
I would be glad if someone could help on this matter. Thanks in advance for any feedback on this!
cheers, Sylvain
#pragma offload_attribute (push,target(mic)) #include <stdio.h> #pragma offload_attribute (pop) #define ALLOC alloc_if(1) free_if(0) #define FREE alloc_if(0) free_if(1) #define REUSE alloc_if(0) free_if(0) int main() { int size=100; // size of buffer char input[size]; // buffer for input data on the CPU char *ptr1=NULL; // reference to MIC buffer 1 char *ptr2=NULL; // reference to MIC buffer 2 // pre-allocate MIC buffers #pragma offload_transfer target(mic:0) nocopy(ptr1 : length(size) ALLOC) #pragma offload_transfer target(mic:0) nocopy(ptr2 : length(size) ALLOC) // test use of buffer 1 snprintf(input,size,"valPtr1"); #pragma offload target(mic:0) in(input[0:size] : REUSE into(ptr1[0:size])) { printf("MIC: %p = %s\n",ptr1,ptr1); } // test use of buffer 2 snprintf(input,size,"valPtr2"); #pragma offload target(mic:0) in(input[0:size] : REUSE into(ptr2[0:size])) { printf("MIC: %p = %s\n",ptr2,ptr2); } // try to do same as above, but with an array instead of fixed variable names ptr1,ptr2 // so that number of elements can be increased and iterated // e.g. instead of ptr1 and ptr2, use ptrX[1], ptrX[2] ... ptrX// compiler does not seem to complain for the allocation // but it crashes at runtime char *ptrX[2]={NULL,NULL}; for (int i=0;i<2;i++) { #pragma offload_transfer target(mic:0) nocopy(ptrX : length(size) ALLOC) } // and then, how to use the buffers ??? /* for (int i=0;i<2;i++) { snprintf(input,size,"valPtrX%d",i); #pragma offload target(mic:0) in(input[0:size] : REUSE into((???)[0:size])) { printf("MIC: %p = %s\n",???,???); } } */ return 0; }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The targetptr modifier is available to declare MIC-only buffers. When you allocate q on MIC, use the targetptr modifier. Then, the existing values in q on the CPU are ignored, MIC buffers allocated for q, and the values in q on the CPU are updated with addresses of MIC buffers. From this point on, the q values should not be directly used on the CPU, but only through the offload pragmas.
To transfer data into or out of q, use the targetptr modifier. Similarly, when deleting the MIC buffers when you are done with them, use the targetptr modifier.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you have not already seen the article Data transfer of an “array of pointers” using the Intel® Language Extensions for Offload (LEO) for the Intel® Xeon Phi™ coprocessor, I believe it offers a method to fit your interests. If not, then please let us know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, and thanks for your fast feedback.
I managed to use array indirection as in the document you recommended (in particular, starting with the very last example in the ref manual https://software.intel.com/en-us/node/524507 describing the copy "into" with arrays).
with a call such as:
#pragma offload target(mic) in (p[0:1] : extent(0:DATA_ELEMS) into(q[ix:1]) into_extent(0:DATA_ELEMS)
- p[0] points to my CPU input data buffer (p is an array of size 1)
- ix is the index of the destination MIC buffer selected for this transfer (q is an array of size N)
- each input and destination buffers are of size DATA_ELEMS
However, I am not sure how to declare and allocate the array q and corresponding destination buffers q[0]...q[N-1] on the MIC ONLY.
I tried a number of things but failed to get it working without q[] initialized also on the CPU.
To summarize what I'm looking to do:
1) once at init: pre-allocate N blocks of size S on the MIC only
2) iteratively at runtime: transfer data from one arbitrary CPU address (and length<=S) into one of these MIC buffers
Please let me know if you have any suggestion.
best regards,
Sylvain
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The targetptr modifier is available to declare MIC-only buffers. When you allocate q on MIC, use the targetptr modifier. Then, the existing values in q on the CPU are ignored, MIC buffers allocated for q, and the values in q on the CPU are updated with addresses of MIC buffers. From this point on, the q values should not be directly used on the CPU, but only through the offload pragmas.
To transfer data into or out of q, use the targetptr modifier. Similarly, when deleting the MIC buffers when you are done with them, use the targetptr modifier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
many thanks for the hint, this is exactly what I needed! I could find the documentation about this targetptr feature only in the ICC 16.0 documentation, although it seems to work perfectly fine with my version 15.0.3. I used the description found at: https://software.intel.com/en-us/node/583639
And for reference, I paste below a full working example of what I was looking to achieve.
Best regards,
Sylvain
// this is a working example of arbitrary CPU pointer to MIC pre-allocated buffers copy #pragma offload_attribute (push,target(mic)) #include <stdio.h> #pragma offload_attribute (pop) #include <stdlib.h> #define ALLOC alloc_if(1) free_if(0) #define FREE alloc_if(0) free_if(1) #define REUSE alloc_if(0) free_if(0) #define MIC_NBUF 5 // number of buffers on MIC #define CPU_NBUF 3 // number of buffers on CPU #define DATA_ELEMS 1000000 // number of items in each buffer (CPU and MIC) #define ALIGN_COUNT 2*1024*1024 // align boundary int main() { __declspec(target(mic)) short int *p[1]; // an array variable of size 1 for input pointer data indirection __declspec(target(mic)) short int *q[MIC_NBUF]; // an array to hold the MIC buffers __declspec(target(mic)) int ix=0; // index of current MIC buffer in use // create some input buffers on the CPU, filled with dummy data to be transfered short int *buf[CPU_NBUF]; // CPU buffers for (int i=0; i<CPU_NBUF; i++) { buf=(short int *)_mm_malloc(sizeof(short int)*DATA_ELEMS,ALIGN_COUNT); for (int j=0;j<DATA_ELEMS;j++) { buf=i*10+j%10; } } // we don't use q[] on the CPU, just fill it with NULL pointers for (int i=0; i<MIC_NBUF; i++) { q=NULL; } // allocate q[0] q[1] ... q[MIC_NBUF-1] on the MIC ONLY (aligned) #pragma offload_transfer target(mic) nocopy (q[0:MIC_NBUF] : extent(0:DATA_ELEMS) ALLOC targetptr align(ALIGN_COUNT)) // transfer from the CPU buffers to the MIC buffers round-robin for (int i=0;i<10;i++) { ix= i % MIC_NBUF; // index of the MIC buffer to use as detination p[0]=buf[i%CPU_NBUF]; // pointer to the CPU buffer to use as source // copy DATA_ELEMS * 'short int' data pointed by p[0] on the CPU to pre-allocated buffer pointed by q[ix] on the MIC #pragma offload target(mic) in (ix) nocopy(q) in (p[0:1] : extent(0:DATA_ELEMS) into(q[ix:1]) into_extent(0:DATA_ELEMS) REUSE targetptr ) { printf("MIC ix=%d ptr=%p value=%d\n",ix,q[ix],(int)q[ix][0]); for (int j=0; j<MIC_NBUF; j++) { printf("q[%d][0]=%d\n",j,q [0]); printf("q[%d][1]=%d\n",j,q [1]); printf("q[%d][2]=%d\n",j,q [2]); } } } return 0; }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The functionality was made available in 15.0 for convenience of some early evaluation/testing before being officially announced in 16.0. Glad you found the solution you were looking for and thank you for sharing that for the benefit of others.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page