- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi folks
When attempting to write a short offload section I get the run time error
address range partially overlaps with existing allocation
but not sure where to start debugging this. Here's the code section in question (sorry it seems to have lost indenting):
#pragma omp parallel for default(none), private(i,k,term3,m,sum_Theta_Psi,n), shared(Theta_i, Psi, ln_Gamma_i, Q, molecules, maxGroupNum)
for (i=0; i<molecules; i++) {
for (k=0; k<maxGroupNum; k++) {
term3 = 0.0;
for (m=0; m<maxGroupNum; m++) {
sum_Theta_Psi = 0.0; // reuse var but not for 'i' component
#pragma vector always
for (n=0; n<maxGroupNum; n++) {
sum_Theta_Psi += Theta_i
}
term3 += Theta_i
}
// reset use of sum_Theta_Psi to be used for second term
sum_Theta_Psi = 0.0;
#pragma vector always
for (m=0; m<maxGroupNum; m++) {
sum_Theta_Psi += Theta_i
}
ln_Gamma_i
}
} //for i
// omp parallel for
} // pragma offload
YOurs, mIchael
링크가 복사됨
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
I have encountered this error when I tried to create coprocessor side arrays without the correspoding host arrays. However, it is difficult to tell what exactly is going wrong in your code. Could you share a little more of your code that includes the offload pragma and the declarations/allocations of the data passed to the offload.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi Sumedh and co... I can't really share the full code since it is one a colleague is working on for publication (sorry) but here's the offload pragma:
#pragma offload target(mic) in(Theta_i:length(molecules*maxGroupNum)) in(Psi:length(maxGroupNum*maxGroupNum)) in(Q:length(maxGroupNum)) in(molecules, maxGroupNum) inout(ln_Gamma_i:length(molecules*maxGroupNum))
with the arrays passed as float*/int* and float**
HTH, M
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Do the actual allocations on the host match those specified in the offload?
Jim Dempsey
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Hi Michael,
Sorry for this super late reply. I looked through your reproducer and noticed that you are trying to transfer data of type float** to the coprocessor. The offload pragma does not support complex data types with pointers and non-contiguous blocks of data. You could either modify your code to fulfill this requirement or could use the virtual shared memory model. Please note that although the shared memory provides easy portability, the overhead and synchronization required in this method leads to a sub-optimal performance.
- 신규로 표시
- 북마크
- 구독
- 소거
- RSS 피드 구독
- 강조
- 인쇄
- 부적절한 컨텐트 신고
Michael,
Sumedh is correct. You will want something like:
[cpp]
float* Adata = &A[0][0];
float* Bdata = &B[0][0];
float* Cdata = &C[0][0];
#pragma offload target(mic) \
in (Adata:length(size1*size2)) \
in(Bdata:length(size2*size2)) \
out(Cdata:length(size1*size2))
{
float **A, **B, **C; // rows x cols
A = alloc_2d_float_offload(Adata, size1, size2); //create array of pointers without allocating data
B = alloc_2d_float_offload(Bdata, size2, size2); // this function inside MIC
C = alloc_2d_float_offload(Cdata, size1, size2);
...
free(A); // but not Adata
free(B); // but not Bdata
free(C); // but not Cdata
[/cpp]
Jim Dempsey
