Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
17060 Discussions

Intrinsic to down-convert all 8 elements of i64 vectors to lower/higher 8 elements of i32 vector

Jun_Hyun_S_
Beginner
1,194 Views

Is there such a thing?

I think pack/unpack intrinsics are somewhere close, but I could not understand exactly what it does.

It seems fairly basic I almost feel stupid asking this, but I would really appreciate a pointer.

I would rather up-convert using a gather instruction, but AFAIK there is no up-conversion for gathering into an epi64 vector

Any suggestions?

0 Kudos
4 Replies
jimdempseyatthecove
Honored Contributor III
1,194 Views

If order is not as much of a concern, consider _mm512_maskz_shuffle_epi32. With this you could upconvert 8 of the 16 32-bit unsigned elements (every other). IOW input an array of 16 32-bit values, output the 8 even or 8 odd indexed values.

Promoting sign would require a little more work.

Jim Dempsey

0 Kudos
Jun_Hyun_S_
Beginner
1,194 Views

Mr. Dempsey,

Thank you for the pointers. I decided to represent a vector of uint128_t's 4 i32 vectors instead of 2 i64 vectors.
It is a pain to load and store without accessing memory as intermediate store if I'm using i64 vectors, and my code is quite latency critical.
I cannot believe they have every other function exposed to intrinsics but not this.

Still, as always, I appreciate your pointers.

Jun

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,194 Views

Is there you are not using 512-bit wide (KNC), or 256-bit wide (AVX/AVX2), vectors? You could load four, or two, of your 4-wide ui32 vectors at a time. Then use permute, swizzle and/or shuffle to position the appropriate 32-bit values into the low DWORD of the QWORDS of interest, then mask off the unnecessary bits. On KNC you have variants with __mask16 k, on AVX you could preload a register with the appropriate AND mask. Because there is not sign propagation you might get it down to 2 instructions (register/register) after load of four, or two, of your 4-wide ui32 vectors

I wish to apologize for the mislead in #2, the intrinsic mentioned is (will be) available on AVX-512.

Jim Dempsey

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,194 Views
0 Kudos
Reply