Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
Announcements
Intel Customer Support will be observing the Martin Luther King holiday on Monday, Jan. 17, and will return on Tues. Jan. 18.
For the latest information on Intel’s response to the Log4j/Log4Shell vulnerability, please see Intel-SA-00646

Intel C++ : _mm256_set1_ps suboptimal ?

bronxzv
New Contributor II
103 Views

I'm in the process of porting a (huge) piece of code from SSE to AVX, looking at the ASM generated by the compiler (Intel C++ Pro 11.1 build #38 IA32 / Windows) I have just remarked that _mm256_set1_ps spits outthis convoluted sequence :

movss xmm0, DWORD PTR [edi+eax*4]

unpcklps xmm0, xmm0

movlhps xmm0, xmm0

vinsertf128 ymm1, ymm0, xmm0, 1


instead ofthemuch simpler :

vbroadcastss ymm0, DWORD PTR [edi+eax*4]


did I miss something or is it simply something that should be improved in a forthcoming version of the compiler ?

0 Kudos
0 Replies
Reply