- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
Hi, I recently met a weird problem.
In the offload area like below
double *values=(double *)_mm_malloc(100*sizeof(double),64); #pragma offload target(mic:0) in(a:length(100)) { ... int idx=3; _mm512_mask_load_epi64(.. , writemask , &(values[idx])); }
It compiles well and runs with error" process on the device 0 was terminated by signal 11 (SIGSEGV)". And I found that when idx is multiple of 8 there is no such error. Intel doc said that the 3th argument of _mm512_mask_load_epi64 function should be a 64-byte-aligned address. But I have already make values aligned in the _mm_malloc function. It should be 64-byte-aligned no matter what idx is.
Link copiato
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
I have already answered this once in your other thread, but let's try again...
You point out that the documentation states that the third argument to the _mm512_mask_load_epi64 must be 64byte aligned, then show us you code which looks like this
_mm512_mask_load_epi64(.. , writemask , &(values[idx]));
- What is the third argument to the intrinsic? Is it "values", or "&(values[idx])" ?
- If "values" is 64 byte aligned, what offset from 64 byte alignment does the third argument have when idx==3?
- Is that offset zero?
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
Thanks a lot. I have figured out what I misunderstood (I used to mistake bit for a bype:)). And you information is very helpful.
But now I am facing a new question: if I want to load the elememts of from 3th to 10th in the values array to a __m512d vector using the _mm512_mask_load_pd function. How can I do to avoid the aligned problem?
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
But now I am facing a new question: if I want to load the elememts of from 3th to 10th in the values array to a __m512d vector using the _mm512_mask_load_pd function. How can I do to avoid the aligned problem?
I think you are asking the wrong question, because that question has no answer (assuming that you really mean the more general issue of loading any arbitrary offset chunk, not just the case whre idx==3, which you could solve by suitably mis-aligning the array).
I think the question you're trying to ask is "How can I efficiently load masked mis-aligned 64 bit integer values into a vector register?" (Which doesn't assert beforehand an impossible condition of using an instruction that can't do the job!)
Unfortunately I'm not a vector instruction set expert, but if you ask that question you're more likely to get a useful answer!
p.s. Looking at the code the compiler generates for this operation is probably a good way to start to answer it, since the compiler embodies a lot of knowledge and expertise.

- Iscriversi a feed RSS
- Contrassegnare la discussione come nuova
- Contrassegnare la discussione come letta
- Sposta questo Discussione per l'utente corrente
- Preferito
- Iscriversi
- Pagina in versione di stampa