- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
It appears that the _mm512_permutevar_epi32 can perform any kinds of data-reordering patterns according to the given index vector. On the case of that, why do we need to use the _mm512_permute4128_epi32 or _mm512_shuffle_epi32 instructions to conduct the inter- or intra-lane data reordering operations? IIRC, the Xeon Phi vector architecture contains lane muxes and element muxes to perform inter- and intra-lane respectively. Therefore, even use _mm512_permutevar_epi32, the input vector should also go through the two types of muxes, is that right?
Actually, I have written a test program to reverse a given vector (from 0-16 to 16-0): one version is using one instruction of _mm512_permutevar_epi32; another version is using two instructions of _mm512_permute4128_epi32 and _mm512_shuffle_epi32 with _MM_PERM_ABCD. It seems the former version case can outperform the latter by 2 times.
Thanks,
Kaixi
Link Copied

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page