- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey all,
simple question:
How does the cast operation _mm512_castps_pd work?
A __m512 data type holds 16 floats i.e. 16 elements. Contrary to that a __m512d data type can only hold 8 elements -- so what happens if I use the following instructions
[cpp]
__m512 a_ = _mm512_set1_ps( 2.0 );
__m512d b_ = _mm512_castps_pd( a_ );
[/cpp]
Is it possible to load data from memory with _mm512_load_ps and then do a "cast operation" from float to double precision into two __m512d registers.
Thanks
Patrick
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the case that this specific cast is not possible how can load data from a 64-byte aligned float array into a __m512d register. I want to perform my FLOPs in double precision, but store/load the data in single precision. I have tried _mm512_extload_pd, but there is no corresponding _MM_UPCONV_PD_ENUM.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Cast intrinsics are the equivalent of a C++ reinterpret_cast. They do not correspond to any actual assembly instruction: all they do is inhibit C's type checking. So _mm512_castps_pd
reinterprets the binary representation of each pair of floats as a double.
What you need is a conversion: _mm512_cvtpslo_pd (and _mm512_cvtpd_pslo).
Since there is no _mm512_cvtpshi_pd instruction, you will have to use some swizzle or permute operation to extract the high-order part of your float vector.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Echoing Sylvain's reply, guidance I received from our instrinsic developer is:
512-bit vectors are represented in a C/C++ program by one of the following types: __m512, __m512i and __m512d.
There is a set of “cast” intrinsics, and _mm512_castps_pd is one of them, which do not do anything except that they allow to treat a 512-bit vector as one of these types.
These intrinsics do not change any values in the vector. So, if you write:
__m512 a_ = _mm512_set1_ps( 2.0 );
__m512d b_ = _mm512_castps_pd( a_ );
then the vectors a_ and _b will be bitwise identical, but the vector a_ will treat 512 bits as 16 single precision floating point values, while the vector b_ will treat the same 512 bits as 8 double precision floating point values.
Additional note – if a user wants real cast of vector elements from float to double then the following intrinsic should be used on KNC:
extern __m512d _mm512_cvtpslo_pd(__m512);
This intrinsic returns 8 double precision elements (low 8 single precision elements of the source vector are casted to double precision)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for that information. The instruction _mm512_cvtpslo_pd works fine!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page