- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried to compile a program with intrinsic such as "_mm512_load_ps(...)" on MIC using icc compiler. But i meet a problem with the code below.
……
#pragma offload_attribute (push,target(mic))
#include "immintrin.h"
#pragma offload_attribute (pop)
……
#pragma offload_attribute (push,target(mic))
......
_d_wt=_mm512_load_ps (&Random_matrix
_Xk=_mm512_add_ps(_mm512_set_1to16_ps(X),_d_wt);
......
#pragma offload_attribute (pop)
……
When i tryed to compile the code, i got the information:
ThetaScheme.o: In function `current_solution':
ThetaScheme.c:(.text+0xd86): undefined reference to `_mm512_load_ps'
ThetaScheme.c:(.text+0xda1): undefined reference to `_mm512_set1_ps'
......
Should i link some special library or do some extra work to compile this kind of program ?
Thank you !
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The undefined references occur when linking for the host-side since those do not exist for the host. You would conditionalize the use in the offload code with:
#ifdef __MIC__
<Phi intrinsic code here>
#else
<Host side equivalent code here>
#endif
If you have not already, look at the C sample under:
/opt/intel/composer_xe_2013/Samples/en_US/C++/mic_samples/intro_sampleC/sampleC006.c
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kevin Davis (Intel) wrote:
The undefined references occur when linking for the host-side since those do not exist for the host. You would conditionalize the use in the offload code with:
#ifdef __MIC__
<Phi intrinsic code here>
#else
<Host side equivalent code here>
#endif
If you have not already, look at the C sample under:
/opt/intel/composer_xe_2013/Samples/en_US/C++/mic_samples/intro_sampleC/sampleC006.c
Thank you a lot! It compiled the program successfully.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply above,but i have a new problem. When i use "__mm512_store_ps(...)" (or other functons like this) to store data to an array ,i got an error when i run the program.
offload error: process on the device 0 was terminated by signal 11
And the code:
#pragma offload_attribute (push,target(mic))
void function(...)
{
static float __attribute__((target(mic),aligned(64))) d_wt[BLOCKSIZE*THREADSNUM] ;
... ...
#pragma omp parallel for private(... ,start,_d_wt,...) schedule(dynamic,1)
{
...
_d_wt=_mm512_loadunpacklo_ps (_d_wt, (void*)(&Random_matrix
_d_wt=_mm512_loadunpackhi_ps (_d_wt, (void*)(&Random_matrix
//_d_wt=_mm512_load_ps (&Random_matrix
_mm512_packstorelo_ps((void*)(&d_wt[start]) , _d_wt );
_mm512_packstorehi_ps((void*)(&d_wt[start+16]), _d_wt );
// ------> start= 16 * omp_get_thread_num();
//_mm512_store_ps(d_wt,_d_wt);
...
}
... ...
}
I wonder why this happened.Thanks.~
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Many times this indicates accessing outside available memory; could be due to insufficient allocation for variables used in the offloaded code. Inspect the variables used in the offloaded code to ensure sufficient memory is allocated and they are decorated accordingly for access in offloaded code. Because it is not obvious from the code snippet, check that Random_matrix it is declared accordingly for access within offloaded code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you . I have checked the Random_matrix . It seems ok . I write a new program to show the problem. There are two cases. In case two,i use an usual method to calculate,and there is no error . In case one,i use "_mm512_store_ps" to store data to array. But i got the same error "offload error: process on the device 0 was terminated by signal 11" I also try to use "_mm512_store_ps((void*)(&B
#include <stdlib.h>
#include <stdio.h>
#define SIZE 1024
#define CASE1
//#define CASE2
#pragma offload_attribute (push,target(mic))
#include "immintrin.h"
void calculate(float* A,float* B)
{
static float __attribute__((target(mic),aligned(64))) X[SIZE] ;
int k,i;
__m512 _A;
__m512 _B;
__m512 _C;
#ifdef __MIC__
for(k=0;k<SIZE;k+=16)
{
_A=_mm512_load_ps((void*)(&A
_B=_mm512_load_ps((void*)(&B
_C=_mm512_add_ps(_A,_B);
#ifdef CASE1
_mm512_store_ps((void*)(&X
#endif
for(i=k;i<k+16;i++)
{
#ifdef CASE2
X=A+B;
#endif
printf("%2.f ",X);
}
printf("\n");
}
#endif
}
#pragma offload_attribute (pop)
int main()
{
int i;
float *A;
float *B;
A=(float*)malloc(sizeof(float)*SIZE);
B=(float*)malloc(sizeof(float)*SIZE);
for(i=0;i<SIZE;i++)
{
A=i;
B=i;
}
#pragma offload_transfer target(mic:0)\
in(A:length(SIZE) alloc_if(1) free_if(0))\
in(B:length(SIZE) alloc_if(1) free_if(0))\
#pragma offload target(mic:0) \
in(A:length(0) alloc_if(0) free_if(0))\
in(B:length(0) alloc_if(0) free_if(0))
calculate(A,B);
return 1;
}
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Appears A and B are not 64-byte aligned. Try _mm_malloc:
A=(float*)_mm_malloc(sizeof(float)*SIZE,64);
B=(float*)_mm_malloc(sizeof(float)*SIZE,64);
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much!
Problem was solved with your advice!
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page