Software Archive
Read-only legacy content
17061 Discussions

cast __m512 to __m512d

Patrick_S_
New Contributor I
2,204 Views

Hey all,

 

simple question:

 

How does the cast operation _mm512_castps_pd work?

A __m512 data type holds 16 floats i.e. 16 elements. Contrary to that a __m512d data type can only hold 8 elements -- so what happens if I use the following instructions

[cpp]

__m512   a_ = _mm512_set1_ps( 2.0 );

__m512d b_ = _mm512_castps_pd( a_ );

[/cpp]

 

Is it possible to load data from memory with _mm512_load_ps and then do a "cast operation" from float to double precision into two __m512d registers.

 

Thanks

Patrick

0 Kudos
4 Replies
Patrick_S_
New Contributor I
2,204 Views

In the case that this specific cast is not possible how can load data from a 64-byte aligned float array into a __m512d register. I want to perform my FLOPs in double precision, but store/load the data in single precision. I have tried _mm512_extload_pd, but there is no corresponding _MM_UPCONV_PD_ENUM.

0 Kudos
Sylvain_C_
Beginner
2,204 Views

Cast intrinsics are the equivalent of a C++ reinterpret_cast. They do not correspond to any actual assembly instruction: all they do is inhibit C's type checking. So _mm512_castps_pd reinterprets the binary representation of each pair of floats as a double.

What you need is a conversion: _mm512_cvtpslo_pd (and _mm512_cvtpd_pslo).

Since there is no _mm512_cvtpshi_pd instruction, you will have to use some swizzle or permute operation to extract the high-order part of your float vector.

0 Kudos
Kevin_D_Intel
Employee
2,204 Views

Echoing Sylvain's reply, guidance I received from our instrinsic developer is:

512-bit vectors are represented in a C/C++ program by one of the following types: __m512, __m512i and __m512d.

There is a set of “cast” intrinsics, and _mm512_castps_pd is one of them, which do not do anything except that they allow to treat a 512-bit vector as one of these types.
These intrinsics do not change any values in the vector. So, if you write:

__m512   a_ = _mm512_set1_ps( 2.0 );
__m512d b_ = _mm512_castps_pd( a_ );

then the vectors a_ and _b will be bitwise identical, but the vector a_ will treat 512 bits as 16 single precision floating point values, while the vector b_ will treat the same 512 bits as 8 double precision floating point values.

Additional note – if a user wants real cast of vector elements from float to double then the following intrinsic should be used on KNC:

extern __m512d  _mm512_cvtpslo_pd(__m512);

This intrinsic returns 8 double precision elements (low 8 single precision elements of the source vector are casted to double precision)

0 Kudos
Patrick_S_
New Contributor I
2,204 Views

Thanks for that information. The instruction _mm512_cvtpslo_pd works fine!

0 Kudos
Reply