Is there you are not using

Jun_Hyun_S_ · ‎09-22-2015

Is there such a thing?

I think pack/unpack intrinsics are somewhere close, but I could not understand exactly what it does.

It seems fairly basic I almost feel stupid asking this, but I would really appreciate a pointer.

I would rather up-convert using a gather instruction, but AFAIK there is no up-conversion for gathering into an epi64 vector

Any suggestions?

jimdempseyatthecove · ‎09-23-2015

If order is not as much of a concern, consider _mm512_maskz_shuffle_epi32. With this you could upconvert 8 of the 16 32-bit unsigned elements (every other). IOW input an array of 16 32-bit values, output the 8 even or 8 odd indexed values.

Promoting sign would require a little more work.

Jim Dempsey

Jun_Hyun_S_ · ‎09-23-2015

Mr. Dempsey,

Thank you for the pointers. I decided to represent a vector of uint128_t's 4 i32 vectors instead of 2 i64 vectors.
It is a pain to load and store without accessing memory as intermediate store if I'm using i64 vectors, and my code is quite latency critical.
I cannot believe they have every other function exposed to intrinsics but not this.

Still, as always, I appreciate your pointers.

Jun

jimdempseyatthecove · ‎09-29-2015

Is there you are not using 512-bit wide (KNC), or 256-bit wide (AVX/AVX2), vectors? You could load four, or two, of your 4-wide ui32 vectors at a time. Then use permute, swizzle and/or shuffle to position the appropriate 32-bit values into the low DWORD of the QWORDS of interest, then mask off the unnecessary bits. On KNC you have variants with __mask16 k, on AVX you could preload a register with the appropriate AND mask. Because there is not sign propagation you might get it down to 2 instructions (register/register) after load of four, or two, of your 4-wide ui32 vectors

I wish to apologize for the mislead in #2, the intrinsic mentioned is (will be) available on AVX-512.

Jim Dempsey

jimdempseyatthecove · ‎09-29-2015

I do not know if you are using this: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3134,3255

Jim Dempsey

Intrinsic to down-convert all 8 elements of i64 vectors to lower/higher 8 elements of i32 vector