Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
The Intel sign-in experience has changed to support enhanced security controls. If you sign in, click here for more information.

Intel C++ : _mm256_set1_ps suboptimal ?

New Contributor II

I'm in the process of porting a (huge) piece of code from SSE to AVX, looking at the ASM generated by the compiler (Intel C++ Pro 11.1 build #38 IA32 / Windows) I have just remarked that _mm256_set1_ps spits outthis convoluted sequence :

movss xmm0, DWORD PTR [edi+eax*4]

unpcklps xmm0, xmm0

movlhps xmm0, xmm0

vinsertf128 ymm1, ymm0, xmm0, 1

instead ofthemuch simpler :

vbroadcastss ymm0, DWORD PTR [edi+eax*4]

did I miss something or is it simply something that should be improved in a forthcoming version of the compiler ?

0 Kudos
0 Replies