- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there such a thing?
I think pack/unpack intrinsics are somewhere close, but I could not understand exactly what it does.
It seems fairly basic I almost feel stupid asking this, but I would really appreciate a pointer.
I would rather up-convert using a gather instruction, but AFAIK there is no up-conversion for gathering into an epi64 vector
Any suggestions?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If order is not as much of a concern, consider _mm512_maskz_shuffle_epi32. With this you could upconvert 8 of the 16 32-bit unsigned elements (every other). IOW input an array of 16 32-bit values, output the 8 even or 8 odd indexed values.
Promoting sign would require a little more work.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mr. Dempsey,
Thank you for the pointers. I decided to represent a vector of uint128_t's 4 i32 vectors instead of 2 i64 vectors.
It is a pain to load and store without accessing memory as intermediate store if I'm using i64 vectors, and my code is quite latency critical.
I cannot believe they have every other function exposed to intrinsics but not this.
Still, as always, I appreciate your pointers.
Jun
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is there you are not using 512-bit wide (KNC), or 256-bit wide (AVX/AVX2), vectors? You could load four, or two, of your 4-wide ui32 vectors at a time. Then use permute, swizzle and/or shuffle to position the appropriate 32-bit values into the low DWORD of the QWORDS of interest, then mask off the unnecessary bits. On KNC you have variants with __mask16 k, on AVX you could preload a register with the appropriate AND mask. Because there is not sign propagation you might get it down to 2 instructions (register/register) after load of four, or two, of your 4-wide ui32 vectors
I wish to apologize for the mislead in #2, the intrinsic mentioned is (will be) available on AVX-512.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do not know if you are using this: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3134,3255
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page