- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Is it possible to allocate memory on the device asynchronously without transfer? For example this is my test case:
[cpp]
double getTime();
void reportTime(std::string);
std::vector<double> timer;
__attribute__ ((target (mic))) float *m_er;
int main(int argc, char* argv[])
{
timer.push_back(getTime());
int size = 150000;
// Alloc memory on host and fill with some data
float* er = (float*)_mm_malloc(size*sizeof(float), 64);
for(unsigned i = 0; i < size; i++)
er = i;
reportTime("Data generation");
// Allocate memory on persistent device pointer
int mysig;
#pragma offload target(mic:0) in(size) nocopy(m_er) signal(&mysig)
{
m_er = (float *)malloc(sizeof (float) * size);
}
// Do computation while memory is being allocated
reportTime("Device memory alloc");
// Copy data from host to device
#pragma offload_transfer target(mic:0) in(er[0:size] : into (m_er[0:size]) ) wait(&mysig)
reportTime("Data transfer");
std::cout << "Overall time: " << getTime() - timer.front() << std::endl;
return 0;
}
void reportTime(std::string s)
{
timer.push_back(getTime());
for(unsigned i = 0; i < timer.size()-1; i++)
timer.back() -= timer;
std::cout << s+": " << timer.back() << std::endl;
}
double getTime()
{
struct timeval t;
gettimeofday(&t, NULL);
return t.tv_sec + 1e-6*t.tv_usec;
}
[/cpp]
Running this in offload mode gives me the output:
Data generation: 0.000620127
Device memory alloc: 1.25236
Data transfer: 0.0046699
Overall time: 1.25766
If the allocation was being performed asynchronously the majority of time would be spent in the data transfer section, as it waits for the allocation to finish and then transfers. So is it not possible to malloc asynchronously on the device or am I doing something wrong?
Thanks!
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In fact the extra time taken at allocation was actually due to initialising the Phi, I hadnt set theOFFLOAD_INIT=on_start environment variable. Having set that, it looks liek the malloc is happening asynchronously, but I am not sure how to properly transfer data from the host into this buffer on the device? The method I currently use, as shown in the above code, does not seem to work as if I try to printf a value from the array it crashes. The output I now get is:
Data generation: 8.70228e-05
Device memory alloc: 0.000203133
Data transfer: 0.00738502
offload error: process on the device 0 was terminated by signal 11 (SIGSEGV)
Having added a section just before the end (between reporting data transfer time and reporting total time) that does this:
[cpp]
// Validate
#pragma offload target(mic:0) nocopy(m_er)
{
printf("m_er[0]: %f [1000] %f [100000] %f", m_er[0],m_er[1000],m_er[100000]);
}
[/cpp]
In addition if I attempt to use an array size over roughly 16.2 million I get this output:
Data generation: 0.00222301
Device memory alloc: 0.000306129
offload error: address range partially overlaps with existing allocation
I assume my method of copying data from a host array into a persistent device array is incorrect, but I am not sure how to do this otherwise?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
The arrays used for data transfers to and from the host should be allocated by the compiler using offload runtime i.e. the offload pragma. Other arrays can be explicitly managed similar to the way you have done in your example.
I have never used the offload pragma to asynchronous allocate memory on the coprocessor. Could you try using that and sharing the results with us? You can find more about persistent compiler-mananged heap allocation on the following page: http://software.intel.com/en-us/articles/effective-use-of-the-intel-compilers-offload-features
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sumedh,
Thanks for your response, the reason I was attempting to do it this way was that using the offload pragma with signals to allocate memory does not appear to be asynchronous, as illustrated by the code + results below. Is there a way of copying data directly from the host into an array that has been allocated on the coprocessor heap using malloc? I could not find any information regarding asynchronous allocation (without transfer) in the documentation (which also showed copying data from host directly into this allocated buffer afterward)
[cpp]
#define ALLOC alloc_if(1) free_if(0)
#define FREE alloc_if(0) free_if(1)
#define REUSE alloc_if(0) free_if(0)
int main(int argc, char* argv[])
{
timer.push_back(getTime());
int size = 15000000;
// Alloc memory on host and fill with some data
float* er = (float*)_mm_malloc(size*sizeof(float), 64);
for(unsigned i = 0; i < size; i++)
er = i;
reportTime("Data generation");
int mysig;
#pragma offload_transfer target(mic:0) nocopy(er : length(size) ALLOC) signal(&mysig)
// Do computation while memory is being allocated
reportTime("Device memory alloc");
// Copy data from host to device
#pragma offload_transfer target(mic:0) in(er : length(size) REUSE) wait(&mysig)
reportTime("Data transfer");
// Validate
#pragma offload target(mic:0) nocopy(er)
{
printf("er[0]: %f [1000] %f [100000] %f\n", er[0],er[1000],er[100000]);
}
std::cout << "Overall time: " << getTime() - timer.front() << std::endl;
return 0;
}
[/cpp]
Timing functions shown in original post omitted, the results from this are:
SIZE=150000
Data generation: 6.48499e-05
Device memory alloc: 0.00554204
Data transfer: 0.00162315
Overall time: 0.00746894
er[0]: 0.000000 [1000] 1000.000000 [100000] 100000.000000
SIZE=15000000
Data generation: 0.0204239
Device memory alloc: 0.0736511
Data transfer: 0.0128999
Overall time: 0.107176
er[0]: 0.000000 [1000] 1000.000000 [100000] 100000.000000
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
I am looking into this. Let me get back to you with what I find.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page