- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
Is there such a thing?
I think pack/unpack intrinsics are somewhere close, but I could not understand exactly what it does.
It seems fairly basic I almost feel stupid asking this, but I would really appreciate a pointer.
I would rather up-convert using a gather instruction, but AFAIK there is no up-conversion for gathering into an epi64 vector
Any suggestions?
Link copiato
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
If order is not as much of a concern, consider _mm512_maskz_shuffle_epi32. With this you could upconvert 8 of the 16 32-bit unsigned elements (every other). IOW input an array of 16 32-bit values, output the 8 even or 8 odd indexed values.
Promoting sign would require a little more work.
Jim Dempsey
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
Mr. Dempsey,
Thank you for the pointers. I decided to represent a vector of uint128_t's 4 i32 vectors instead of 2 i64 vectors.
It is a pain to load and store without accessing memory as intermediate store if I'm using i64 vectors, and my code is quite latency critical.
I cannot believe they have every other function exposed to intrinsics but not this.
Still, as always, I appreciate your pointers.
Jun
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
Is there you are not using 512-bit wide (KNC), or 256-bit wide (AVX/AVX2), vectors? You could load four, or two, of your 4-wide ui32 vectors at a time. Then use permute, swizzle and/or shuffle to position the appropriate 32-bit values into the low DWORD of the QWORDS of interest, then mask off the unnecessary bits. On KNC you have variants with __mask16 k, on AVX you could preload a register with the appropriate AND mask. Because there is not sign propagation you might get it down to 2 instructions (register/register) after load of four, or two, of your 4-wide ui32 vectors
I wish to apologize for the mislead in #2, the intrinsic mentioned is (will be) available on AVX-512.
Jim Dempsey
- Contrassegnare come Nuovo
- Preferito
- Iscriversi
- Disattiva notifiche
- Iscriversi a feed RSS
- Evidenziare
- Stampare
- Segnalare contenuto inappropriato
I do not know if you are using this: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3134,3255
Jim Dempsey

- Iscriversi a feed RSS
- Contrassegnare la discussione come nuova
- Contrassegnare la discussione come letta
- Sposta questo Discussione per l'utente corrente
- Preferito
- Iscriversi
- Pagina in versione di stampa