Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Felipe_S_
Beginner
66 Views

Offload into MIC (Xeon Phi) error iterating over loaded array

Jump to solution
 

I have problems when offloading some data structures to my MIC.

I am offloading into MIC with the following directives:

    #pragma offload target(mic:mic_no)\
    inout(is_selected : length(query_sequences_count)ALLOC)\
    in(a:length(a_size) ALLOC)\
    in(a_disp:length(offload_db_count)ALLOC)

However if I try to execute inside the offloaded region:

//loads next 64 characters of a into datadb
__m512i datadb __attribute__ ((aligned(64)));
datadb = _mm512_load_epi32(a+iter_db+a_disp[j]);

This causes the following error:

Offload error:process on the device 0 was terminated by signal 11(SIGSEGV)

But if I instead copy the content of a into another array like this:

char db[64];
for(window_db_iter = 0; window_db_iter < 64; window_db_iter++)
    db[window_db_iter] = *(a+iter_db+a_disp[j]+window_db_iter);

//Now this works fine
datadb = _mm512_load_epi32(db);

I have checked that a offloads with the correct length, a_size is the size of a and that a_disp is correct as well. Also a+iter_db+a_disp remains always inside the bounds of memory. My guess is that it has to do with the process of copying the memory onto the MIC. Any ideas?

Thanks!

0 Kudos

Accepted Solutions
Rajiv_D_Intel
Employee
66 Views

Most likely the address a+iter_db+a_disp is not 64-byte aligned.

That expression probably assumes that some variables on MIC are aligned. I suspect the corresponding variables on the CPU are not 64-byte aligned, else the compiler would have aligned the MIC variables also on a 64-byte boundary.

One way to force alignment on MIC is to add the "align(64)"modifier whenever you use the ALLOC modifier (which I assume does alloc_if(1) free_if(0)).

However, since the CPU data does not have matching alignment, the CPU and MIC variables will not have the same offset within a 64-byte boundary, and data transfer will be slightly slower.

The best thing to do is to align the CPU variable on a 64-byte boundary and to not use the "align(64)" modifier. Then, the MIC variable will also be 64-byte aligned, you should not get an alignment fault, and the mutual alignment within 64 bytes of CPU and MIC variables will make data transfer optimal.

View solution in original post

0 Kudos
2 Replies
Rajiv_D_Intel
Employee
67 Views

Most likely the address a+iter_db+a_disp is not 64-byte aligned.

That expression probably assumes that some variables on MIC are aligned. I suspect the corresponding variables on the CPU are not 64-byte aligned, else the compiler would have aligned the MIC variables also on a 64-byte boundary.

One way to force alignment on MIC is to add the "align(64)"modifier whenever you use the ALLOC modifier (which I assume does alloc_if(1) free_if(0)).

However, since the CPU data does not have matching alignment, the CPU and MIC variables will not have the same offset within a 64-byte boundary, and data transfer will be slightly slower.

The best thing to do is to align the CPU variable on a 64-byte boundary and to not use the "align(64)" modifier. Then, the MIC variable will also be 64-byte aligned, you should not get an alignment fault, and the mutual alignment within 64 bytes of CPU and MIC variables will make data transfer optimal.

View solution in original post

0 Kudos
Felipe_S_
Beginner
66 Views

Thanks, you were right. It works now when using the align(64) modifier.
I'll try to align myself the CPU variable now so that the data transfer will be faster.

Thanks again!
 

0 Kudos