How to reinterpret a __m128 value as __m256?

Matthias_Kretz · ‎01-25-2013

Hi,

I'm looking for a way to do what _mm256_cast??128_??256 does, just without the part where it says "the upper 128 bits are undefined". What I do is that I execute some VEX coded SSE instruction, which results in the lower 128 bits to store the result and the upper 128 bits to be zeroed. Now, I want to continue to use this register for an AVX intrinsic, or just store the whole 256 bits to memory. With the currently available intrinsics I see no other safe way other than to use [cpp]_mm256_insertf128_??(_mm256_cast??128_??256(x), _mm_setzero_??(), 1)[/cpp], which is major overkill for something that, in reality, doesn't need any extra instructions.

From my tests, the cast intrinsic does what I want when I use clang, GCC, or ICC. But MSVC prefers to do the cast via 128bit store + 256bit load (stupid compiler). And even if I had luck with also MSVC, I'd rather not depend on undefined behavior. Do you have any idea how to do this? If you have a compiler-specific solution that would also be interesting. E.g. an inline asm statement that would do what I need...

SergeyKostrov · ‎01-25-2013

>>...But MSVC...stupid compiler... Sorry, but I think such statements do not look good.

Matthias_Kretz · ‎01-25-2013

Yes, sorry. I'm rather frustrated by all the quirks, bugs and missing optimizations in MSVC...