Solved: Reinterpret casting from _mm256i to _mm256

sergiomatiz · ‎06-10-2022

Hi,

I am trying to reinterpret cast between _mm256i to _mm256 by using the casting intrinsics, however, I observe that the underlying value (bits) after casting change, which I did not expect. This is the code snippet:

#include <iostream>
#include <xmmintrin.h>
#include <immintrin.h>
#include <x86intrin.h>
#include <inttypes.h>
#include <iomanip>
#include <bit>
 
using namespace std;
  
int main()
{
    // uint32 value representing the number 0.5430 in float
    // hexadecimal value is 0x3f0b0000

    __m256i input_uint32 = _mm256_set_epi32(
        1057685504U, 1057685504U, 1057685504U, 1057685504U,
        1057685504U, 1057685504U, 1057685504U, 1057685504U);
    
    __m128i y1_128 = _mm256_extracti128_si256(input_uint32, 0);
    uint32_t u1 = _mm_extract_ps(y1_128, 0);

    // Prints 3f0b0000
    cout << hex << u1 << endl;              
  
    // Expression below seems to be doing casting and not reinterpret casting
    __m256 input_float32 = _mm256_castsi256_ps(input_uint32);
 
    __m128 t1_128 = _mm256_extractf32x4_ps(input_float32, 0);

    float f1 = _mm_extract_ps(t1_128, 0);
    uint32_t f1_as_uint32 = reinterpret_cast<uint32_t &>(f1);

    // Prints 4e7c2c00 (instead of 3f0b0000)
    cout << hex << f1_as_uint32 << endl;
    // Prints 1.05769e+09 (it does cast instead of reinterpret cast)
    cout << f1 << endl;

    // By doing normal reinterpret casting
    uint32_t data_uint32 = 1057685504U;
    float expected_result = reinterpret_cast<float &>(data_uint32);
    // Prints 0.5430
    cout << expected_result << endl;
   
    return 0;
}

My understanding is that the instruction '_mm256_castsi256_ps' would act as reinterpret_cast, but it seems to be doing casting instead. Am I perhaps missing something? is there another expression for reinterpret_cast?

I would appreciate any information you can provide on this issue.

Sergey_M_Intel1 · ‎06-14-2022

The _mm256_castsi256_ps (and other *cast* intrinsics) don't do any conversions and don't normally produce any code. They are just to keep C++ compilers happy by providing explicit type cast. What I see wrong in your code is that the extra conversion is made in these lines since _mm_extract_ps returns "int" and not a "float":

float f1 = _mm_extract_ps(t1_128, 0);

uint32_t f1_as_uint32 = reinterpret_cast<uint32_t &>(f1);

You should just write:

uint32_t f1_as_uint32 = _mm_extract_ps(t1_128, 0);

FWIW, here is a useful link where you can learn about various intrinsics: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html

View solution in original post

Sergey_M_Intel1 · ‎06-14-2022