- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Craig,
I am unclear of what exactly you are trying to exactly do in this code. As I understand it, you are trying to get rid of the host allocation in the constructor. However, you transfer data from the host to the coprocessor by using the 'in' clause in the offload pragma in the add function.
If you are indeed trying to transfer data to and from the coprocessor, it makes sense to have a corresponding host-side array and hence your current code looks good.
If you only want to create coprocessor-side arrays, I would recommend using compiler-manager heap-allocated arrays i.e. arrays allocated using the offload or offload_transfer pragma. (Please refer to this for more information). However, since you cannot use uninitialized pointers to allocate data on the coprocessor using the pragmas, you would need to initialize them with a unique value. For more information on why you cannot use uninitialized pointers, please take a look at this blog. My best guess is that your code should look similar to this:
[cpp]
gen_vector_mic(int length) : length(length)
{
//Created a dummy array with just 1 element
this->data = (T*)memalign(64, 1*sizeof(T));
T* p = this->data;
#pragma offload_transfer target(mic:0) nocopy(p:length(length) ALLOC)
{}
}
[/cpp]
Also, I notice that the variables are lacking decoration for MIC architecture (__attribute__((target(mic))) or __declspec(target(mic)) ). Also, the synatx of the offload pragma in the add function is incorrect. Please refer to this compiler reference for the correct syntax.
I hope this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes that answers my question! I will also clean up the code!
Cheers,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes that answers my question! I will also clean up the code!
Cheers,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It seems this is only working for relatively small vectors. When the vectors get large I get the following error. What's reason why this is happening?
output:
cpelissi@borg01x081:~/scracth/programs> ./reproducer.x
number of bytes i 16*2^0
number of bytes i 16*2^1
number of bytes i 16*2^2
number of bytes i 16*2^3
number of bytes i 16*2^4
number of bytes i 16*2^5
number of bytes i 16*2^6
number of bytes i 16*2^7
number of bytes i 16*2^8
number of bytes i 16*2^9
offload error: address range partially overlaps with existing allocation
code:
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include "omp.h"
#include "offload.h"
#include "math.h"
#define REUSE length(0) alloc_if(0) free_if(0)
#define ALLOC alloc_if(1) free_if(0)
#define FREE alloc_if(0) free_if(1)
#define START(message)\
template<typename T>
struct gen_vector_mic
{
int length;
T* data;
gen_vector_mic(int length) : length(length)
{
this->data = (T*)memalign(64, 1*sizeof(T));
T* p = this->data;
#pragma offload_transfer target(mic) nocopy(p:length(length) ALLOC)
{}
}
~gen_vector_mic()
{
T* p = this->data;
#pragma offload target(mic) in(p: length(0) FREE)
{ }
}
};
typedef gen_vector_mic<double> vector_mic;
typedef gen_vector_mic<float> float_vector_mic;
int main(int argc, char** argv)
{
for(int i=0; i<20; ++i)
{
printf("2^%i\n", i);
int length = pow(2,i);
gen_vector_mic<double> v1(length);
gen_vector_mic<double> v2(length);
}
return 0;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are allocatting only one element on the host for v1 and v2 vectors but using a length which make it overlap the two pointer passed in the "in" clause
this->data = (T*)memalign(64, 1*sizeof(T));
data = 38309120 length = 32 <- here the address range is [38309120, 38309376)
allocate memory
data = 38308992 length = 32 <- here the address range is [38308992, 38309248)
offload error: address range partially overlaps with existing allocation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So this brings me to the original question. How can I create a "host pointer" that is associated with a coprocessor buffer that can be passed to host routines that use them to execute routines on the coprocessor without allocating and "image" buffer on the host?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Use the "into" clause
#include <stdint.h>
#include <stdio.h>
#include <malloc.h>
main(int yy, char *argv[])
{
int * mic_ptr;
int * ptr1 = malloc(100);
int * ptr2 = malloc(100);
int out_val, rez_OK;
*ptr1 = 10;
*ptr2 = 20;
#pragma offload_transfer target(mic:0) nocopy(mic_ptr : length(10) alloc_if(1) free_if(0))
#pragma offload target(mic:0) in(ptr1[0:1] : alloc_if(0) free_if(0) into(mic_ptr[4:1]) )
{
printf ("TARGET.1 mic_ptr[4] = %d \n", mic_ptr[4]);
}
free(ptr1);
ptr1 = NULL;
#pragma offload target(mic:0) in (ptr2[0:1] : free_if(0) into(mic_ptr[4:1])) \
out(mic_ptr[4:1] : free_if(1) into(out_val))
{
printf ("TARGET.2 mic_ptr[4] = %d \n", mic_ptr[4]);
}
rez_OK = (out_val == *ptr2);
printf("%s\n", rez_OK? "PASSED" : "FAILED");
return (rez_OK == 1);
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Aren't you still having to allocate host memory? What I wanted was a class with a host pointer that's associated to a coprocessor buffer without allocating a buffer on the host. It seems like this may not be possible. The response by Sumedh Naik (Intel) was what I was looking for but unfortunately it didn't work. The reason I want this is because I have already developed a series of templated linear algebra routines for lattice Quantum Chromodynamics suitable for cpus and gpus and I would like to add the intel phi. If i can create the same "host" vector routines which take a vector class as described then all the higher level routines will work without writing any more code. For example:
template<typename T>
double cg_solver( T src_vec, T sol_vec, mat_mult<T vec_in, T vec_out>)
{
vector routines ...
}
where T would be vector_cpu, vector_gpu, and hopefull vector_mic
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Craig,
I am sorry for the late reply. I may finally have a solution for you. The following code worked for me.
[cpp]
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include "omp.h"
#include "offload.h"
#include "math.h"
#define REUSE length(0) alloc_if(0) free_if(0)
#define ALLOC alloc_if(1) free_if(0)
#define FREE alloc_if(0) free_if(1)
#define START(message)\
template<typename T>
struct gen_vector_mic
{
int length;
size_t data;
gen_vector_mic(int length) : length(length)
{
size_t data;
#pragma offload target(mic:0) out(data)
{
T *ptr=(T*)malloc(sizeof(T)*length);
data=size_t(ptr);
}
this->data=data;
}
~gen_vector_mic()
{
size_t data=this->data;
#pragma offload target(mic:0) in(data)
{
T *ptr=(T*)(data);
free(ptr);
}
}
};
typedef gen_vector_mic<double> vector_mic;
typedef gen_vector_mic<float> float_vector_mic;
int main(int argc, char** argv)
{
for(int i=0; i<20; ++i)
{
printf("2^%i\n", i);
int length = pow(2,i);
gen_vector_mic<double> v1(length);
gen_vector_mic<double> v2(length);
}
[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sumedh, so instead of letting the #pragma offload allocate the memory you are specifically allocating it yourself. Another way would be to actually alloc on the host, make the offload_transfer nocopy() call to create the memory on the MIC, then [edit: realloc the memory on the host back to length 1]. Obviously not a good option if speed is necessary.
Both of these methods seem a bit *hacky*. I don't think it is unreasonable to desire a pointer on the host to link to allocated memory on the MIC without actually having something allocated on the host. Incidentally I also have the same problem when trying to create a memory pool on the MIC. I will let you know if I come up with a better solution.
Corey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sumedh, can you please explain how you will transfer data between CPU and the array you allocate on MIC?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Dhairya,
I gave your question some thought and am yet to come up a with solution.
For the Intel MIC architecture, you can transfer the data to the coprocessor by either using #pragma offload or #pragma offload_transfer. However, there is just one caveat: the array on the coprocessor should be allocated by using offload runtime (#pragma offload or #pragma offload_transfer). Hence, we cannot transfer data to the array that we allocated using the above method.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page