Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.
7953 Discussions

AVX2 permute intrinsics and copying a source-vector element to multiple destination-vector elements

Nathan_Weeks
Beginner
507 Views

In the User and Reference Guide for the Intel C++ Compiler 15.0, the descriptions of the AVX2 intrinsics _mm256_permutevar8x32_epi32 and _mm256_permutevar8x32_ps state:

The intrinsic does NOT allow to copy the same element of the source vector to more than one element of the destination vector.

However, in the Intel 64 and IA-32 Architectures Software Developer's Manual, the descriptions of the corresponding AVX2 instructions (VPERMD and VPERMPS) state:

Note that this instruction permits a doubleword in the source operand to be copied to more than one doubleword location in the destination operand.

The cursory testing I did of _mm256_permutevar8x32_epi32 and _mm256_permutevar8x32_ps seemed to indicate that they do support copying the same element of the source vector to more than one element of the destination vector.

Could someone from Intel either confirm or refute my suspicions that the User and Reference Guide for the Intel C++ Compiler 15.0 is incorrect in this case? Thanks!

0 Kudos
1 Reply
KitturGanesh
Employee
507 Views

Hi Nathan, that's a bug in the reference guide as the intrinsics provides the full functionality of the underlying instructions. I'll file the issue with the doc group and keep you updated as soon as the release with the fix is out.

_Kittur 

0 Kudos
Reply