- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would like to use offload, but the length of the result is not until the offload has finished. I was wondering about the correct method of doing this. Here is my idea. Will this work? Is there a more elegant way?
Georg
[cpp]
const char *pBufOut=NULL;
std::size_t lOutBufLen;
// compute result on MIC, alloc buffer, but return only length.
#pragma offload target(mic) out(lOutBufLen) nocopy(pBufOut: length(0) alloc_if(0) free_if(0))
{
// compute length of result
...
pBufOut=malloc(computedSize);
// copy contents of result to buffer
...
lOutBufLen=computedSize;
}
// create suitable buffer on host side
pBufOut=malloc(lOutBufLen);
// copy result to host side, deallocate memory on MIC, do nothing else...
#pragma offload target(mic) out(pBufOut: length(lOutBufLen) alloc_if(0) free_if(0))
{;}
[/cpp]
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Greg,
the solution that you outlined did not work on my system, probably because the host value of pBufOut remains equal to NULL after the offload, and the offload RTL does not know how to look up this array in the offloaded array table. The only way that I could make it work it is by passing a second array and copying the contents into it on the coprocessor:
#include <stdio.h>
#include <stdlib.h>int main() {
size_t bufLen;
char* pBufMIC = NULL;
char* pBufHost = NULL;#pragma offload target(mic:0) nocopy(pBufMIC : length(0) alloc_if(0) free_if(0))
{
bufLen = 60;
pBufMIC = (char*) malloc(bufLen);
for (int i = 0; i < bufLen; i++)
pBufMIC = (char)(48+i%10);
}printf("On the host, bufLen = %ld\n", bufLen);
pBufHost = (char*) malloc(bufLen);#pragma offload target(mic:0) nocopy(pBufMIC : length(0) alloc_if(0) free_if(0)) out(pBufHost : length(bufLen))
{
pBufHost[0:bufLen] = pBufMIC[0:bufLen];
}for (int i = 0; i < bufLen; i++) printf("%c", pBufHost);
printf("\n");
}$ icpc foo1.cc
$ ./a.out
On the host, bufLen = 60
012345678901234567890123456789012345678901234567890123456789
There is, however, a much more elegant solution in the virtual-shared memory model:
#include <stdlib.h>
#include <iostream>char* _Cilk_shared pBuf = NULL;
_Cilk_shared size_t bufLen;_Cilk_shared void MyFunction() {
bufLen = 60;
pBuf = (char*) _Offload_shared_malloc(bufLen);
#ifdef __MIC__
std::cout << "Initialized the array on the coprocessor\n" << std::flush;
#endif
for (int i = 0; i < bufLen; i++)
pBuf = (char)(48+i%10);
}int main() {
_Cilk_offload MyFunction();
std::cout << "Back on the host, lOutBufLen = " << bufLen << std::endl;
for (int i = 0; i < bufLen; i++) std::cout << pBuf; std::cout << std::endl;
}$ icpc foo2.cc
$ ./a.out
Initialized the array on the coprocessor
Back on the host, lOutBufLen = 60
012345678901234567890123456789012345678901234567890123456789

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page