- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
I tried to compile a program with intrinsic such as "_mm512_load_ps(...)" on MIC using icc compiler. But i meet a problem with the code below.
……
#pragma offload_attribute (push,target(mic))
#include "immintrin.h"
#pragma offload_attribute (pop)
……
#pragma offload_attribute (push,target(mic))
......
_d_wt=_mm512_load_ps (&Random_matrix
_Xk=_mm512_add_ps(_mm512_set_1to16_ps(X),_d_wt);
......
#pragma offload_attribute (pop)
……
When i tryed to compile the code, i got the information:
ThetaScheme.o: In function `current_solution':
ThetaScheme.c:(.text+0xd86): undefined reference to `_mm512_load_ps'
ThetaScheme.c:(.text+0xda1): undefined reference to `_mm512_set1_ps'
......
Should i link some special library or do some extra work to compile this kind of program ?
Thank you !
コピーされたリンク
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
The undefined references occur when linking for the host-side since those do not exist for the host. You would conditionalize the use in the offload code with:
#ifdef __MIC__
<Phi intrinsic code here>
#else
<Host side equivalent code here>
#endif
If you have not already, look at the C sample under:
/opt/intel/composer_xe_2013/Samples/en_US/C++/mic_samples/intro_sampleC/sampleC006.c
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Kevin Davis (Intel) wrote:
The undefined references occur when linking for the host-side since those do not exist for the host. You would conditionalize the use in the offload code with:
#ifdef __MIC__
<Phi intrinsic code here>
#else
<Host side equivalent code here>
#endif
If you have not already, look at the C sample under:
/opt/intel/composer_xe_2013/Samples/en_US/C++/mic_samples/intro_sampleC/sampleC006.c
Thank you a lot! It compiled the program successfully.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Thanks for the reply above,but i have a new problem. When i use "__mm512_store_ps(...)" (or other functons like this) to store data to an array ,i got an error when i run the program.
offload error: process on the device 0 was terminated by signal 11
And the code:
#pragma offload_attribute (push,target(mic))
void function(...)
{
static float __attribute__((target(mic),aligned(64))) d_wt[BLOCKSIZE*THREADSNUM] ;
... ...
#pragma omp parallel for private(... ,start,_d_wt,...) schedule(dynamic,1)
{
...
_d_wt=_mm512_loadunpacklo_ps (_d_wt, (void*)(&Random_matrix
_d_wt=_mm512_loadunpackhi_ps (_d_wt, (void*)(&Random_matrix
//_d_wt=_mm512_load_ps (&Random_matrix
_mm512_packstorelo_ps((void*)(&d_wt[start]) , _d_wt );
_mm512_packstorehi_ps((void*)(&d_wt[start+16]), _d_wt );
// ------> start= 16 * omp_get_thread_num();
//_mm512_store_ps(d_wt,_d_wt);
...
}
... ...
}
I wonder why this happened.Thanks.~
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Many times this indicates accessing outside available memory; could be due to insufficient allocation for variables used in the offloaded code. Inspect the variables used in the offloaded code to ensure sufficient memory is allocated and they are decorated accordingly for access in offloaded code. Because it is not obvious from the code snippet, check that Random_matrix it is declared accordingly for access within offloaded code.
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Thank you . I have checked the Random_matrix . It seems ok . I write a new program to show the problem. There are two cases. In case two,i use an usual method to calculate,and there is no error . In case one,i use "_mm512_store_ps" to store data to array. But i got the same error "offload error: process on the device 0 was terminated by signal 11" I also try to use "_mm512_store_ps((void*)(&B
#include <stdlib.h>
#include <stdio.h>
#define SIZE 1024
#define CASE1
//#define CASE2
#pragma offload_attribute (push,target(mic))
#include "immintrin.h"
void calculate(float* A,float* B)
{
static float __attribute__((target(mic),aligned(64))) X[SIZE] ;
int k,i;
__m512 _A;
__m512 _B;
__m512 _C;
#ifdef __MIC__
for(k=0;k<SIZE;k+=16)
{
_A=_mm512_load_ps((void*)(&A
_B=_mm512_load_ps((void*)(&B
_C=_mm512_add_ps(_A,_B);
#ifdef CASE1
_mm512_store_ps((void*)(&X
#endif
for(i=k;i<k+16;i++)
{
#ifdef CASE2
X=A+B;
#endif
printf("%2.f ",X);
}
printf("\n");
}
#endif
}
#pragma offload_attribute (pop)
int main()
{
int i;
float *A;
float *B;
A=(float*)malloc(sizeof(float)*SIZE);
B=(float*)malloc(sizeof(float)*SIZE);
for(i=0;i<SIZE;i++)
{
A=i;
B=i;
}
#pragma offload_transfer target(mic:0)\
in(A:length(SIZE) alloc_if(1) free_if(0))\
in(B:length(SIZE) alloc_if(1) free_if(0))\
#pragma offload target(mic:0) \
in(A:length(0) alloc_if(0) free_if(0))\
in(B:length(0) alloc_if(0) free_if(0))
calculate(A,B);
return 1;
}
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Appears A and B are not 64-byte aligned. Try _mm_malloc:
A=(float*)_mm_malloc(sizeof(float)*SIZE,64);
B=(float*)_mm_malloc(sizeof(float)*SIZE,64);
- 新着としてマーク
- ブックマーク
- 購読
- ミュート
- RSS フィードを購読する
- ハイライト
- 印刷
- 不適切なコンテンツを報告
Thank you very much!
Problem was solved with your advice!