- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm in the process of porting a (huge) piece of code from SSE to AVX, looking at the ASM generated by the compiler (Intel C++ Pro 11.1 build #38 IA32 / Windows) I have just remarked that _mm256_set1_ps spits outthis convoluted sequence :
movss xmm0, DWORD PTR [edi+eax*4]
unpcklps xmm0, xmm0
movlhps xmm0, xmm0
vinsertf128 ymm1, ymm0, xmm0, 1
instead ofthemuch simpler :
vbroadcastss ymm0, DWORD PTR [edi+eax*4]
did I miss something or is it simply something that should be improved in a forthcoming version of the compiler ?
Link Copied
0 Replies
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page