- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to write a very simple program, in which I natively allocate some memory on coprocessor and try to copy data from host onto this natively allocated memory but I keep getting errors. Could anyone kindly advise what is going wrong in my code.
_attribute__ ((target(mic)))
unsigned long long numElems;
void
PerformNativeAllocation(short* ptr, short* temp)
{
cout << " Perform Native allocation " << endl;
#pragma offload target(mic:0) \
nocopy(temp)
{
temp = (short*) malloc(numElems*sizeof(short));
//free(temp);
}
#pragma offload target(mic:0) \
in(ptr[0:numElems] :into(temp) alloc_if(0) free_if(0))
{
for (unsigned long long ii=0; ii < numElems; ++ii)
{
temp[ii]*=2;
}
free(temp);
}
}
Thank you
AM
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Development's guidance is: "Memory allocated by the user using malloc or some such API cannot participate in the data transfer pragmas. For the pragmas to be usable, the allocation must be done using the pragmas also."
There is an exception to that and if compelled one can call malloc/memcpy in offloaded code; however, there is inefficiency with the extra allocation for the IN() variable in addition to the user target-side malloc. There is an example demonstrating this under Example of Local Pointer on the Effective Use of the Intel Compiler's Offload Features page. Instead of using INTO, one uses malloc and memcpy in the offloaded code.
The alternative is to use the pragma allocation and INTO as shown below.
void PerformNativeAllocation(short* ptr, short* temp)
{
cout << " Perform Native allocation " << endl;
// allocate temp on target only
#pragma offload_transfer target(mic:0) nocopy(temp : length(numElems) alloc_if(1) free_if(0))
// transfer ptr values into temp
#pragma offload target(mic:0) \
in(ptr[0:numElems] :into(temp) alloc_if(0) free_if(0))
{
for (unsigned long long ii=0; ii < numElems; ++ii)
{
temp[ii]*=2;
}
}
// transfer values out and free target memory
#pragma offload_transfer target(mic:0) out(temp[0:numElems] : into(ptr) alloc_if(0) free_if(1))
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Kevin,
Thank you very much for your reply and help. I have been allocating memory and transferring data over to MIC using the same approach as suggested by you; however, I was trying to see if that initial memory allocation time using "nocopy" clause can be reduced and it appears that it cannot. Thank you for the heads up though, this really saves a lot of my time.
Sincerely,
AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok. Maybe you have also already tried "hiding" the initial allocation some by making it asynchronous using the signal() clause and then either a subsequent offload_wait pragma, wait() clause for the INTO transfer, of the _Offload_signaled() API?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I tried that too in one of my double buffering toy programs but I still see around 15 - 20 sec worth of initial (one time) allocation (offload) delay. Once the memory is allocated, the transfer is pretty fast. Thank you for the heads up though Kevin. I really appreciate your input and help.
AVM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, the initial allocation slowness is a known matter. It is within the card's OS and hopefully it can continue decreasing over time.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank your for your help and reply Kevin.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page