cast __m512 to __m512d

Patrick_S_ · ‎03-10-2014

Hey all,

simple question:

How does the cast operation _mm512_castps_pd work?

A __m512 data type holds 16 floats i.e. 16 elements. Contrary to that a __m512d data type can only hold 8 elements -- so what happens if I use the following instructions

[cpp]

__m512 a_ = _mm512_set1_ps( 2.0 );

__m512d b_ = _mm512_castps_pd( a_ );

[/cpp]

Is it possible to load data from memory with _mm512_load_ps and then do a "cast operation" from float to double precision into two __m512d registers.

Thanks

Patrick

Patrick_S_ · ‎03-10-2014

In the case that this specific cast is not possible how can load data from a 64-byte aligned float array into a __m512d register. I want to perform my FLOPs in double precision, but store/load the data in single precision. I have tried _mm512_extload_pd, but there is no corresponding _MM_UPCONV_PD_ENUM.

Sylvain_C_ · ‎03-11-2014

Cast intrinsics are the equivalent of a C++ reinterpret_cast. They do not correspond to any actual assembly instruction: all they do is inhibit C's type checking. So _mm512_castps_pd reinterprets the binary representation of each pair of floats as a double.

What you need is a conversion: _mm512_cvtpslo_pd (and _mm512_cvtpd_pslo).

Since there is no _mm512_cvtpshi_pd instruction, you will have to use some swizzle or permute operation to extract the high-order part of your float vector.

Kevin_D_Intel · ‎03-11-2014

Echoing Sylvain's reply, guidance I received from our instrinsic developer is:

512-bit vectors are represented in a C/C++ program by one of the following types: __m512, __m512i and __m512d.

There is a set of “cast” intrinsics, and _mm512_castps_pd is one of them, which do not do anything except that they allow to treat a 512-bit vector as one of these types.
These intrinsics do not change any values in the vector. So, if you write:

__m512 a_ = _mm512_set1_ps( 2.0 );
__m512d b_ = _mm512_castps_pd( a_ );

then the vectors a_ and _b will be bitwise identical, but the vector a_ will treat 512 bits as 16 single precision floating point values, while the vector b_ will treat the same 512 bits as 8 double precision floating point values.

Additional note – if a user wants real cast of vector elements from float to double then the following intrinsic should be used on KNC:

extern __m512d _mm512_cvtpslo_pd(__m512);

This intrinsic returns 8 double precision elements (low 8 single precision elements of the source vector are casted to double precision)

Patrick_S_ · ‎03-11-2014

Thanks for that information. The instruction _mm512_cvtpslo_pd works fine!