- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello.
This issue looks like bad design or bug for a lot of programmers for many years. But problem is still there.
Why _mm_extract_ps returns int type? At first we can see intrinsics design features like _ps and _epi32 endings for float and int types respectively. We have _mm_extract_epi32 which calls pextrd instruction which return int type. And _mm_extract_ps uses extractps and return INT type again? But why? Will somebody fix it some day?
I want to write code like
template <int i> float get() const noexcept { return _mm_extract_ps(xmm_, i); }
and not like
template <int i> float get() const noexcept { int v = _mm_extract_ps(xmm_, i); float f; memcpy(&f, &v, sizeof(v)); // standard recommended cross-compiler type-punning for c++ return f; }
P.S. Also maybe somebody can explain why we need both extractps and pextrd assembly intructions when technically they are the same? I don't think they change some flags or do some checks anyway. Now I can't see the difference with
int _mm_extract_ps(__m128 xmm, int i) { return _mm_extract_epi32(_mm_castps_si128(xmm), i); }
Best regards, Vyacheslav
- Tags:
- Parallel Computing
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Assuming you're writing 64bit code, then floats are stored in xmm registers anyway.
So really want you want is a vector register shuffle to just move the floating point value into the bottom of the vector register and then to use that register in scalar mode.
See doug65536's answer here;
So something like;
template <int i> float get() const noexcept { return _mm_cvtss_f32(_mm_shuffle_ps(xmm_, xmm_, _MM_SHUFFLE(0, 0, 0, i))); }
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry but please no such assumings. I need to use SIMD code on x86, x64 with cross-compilers and platforms (win, lin, mac).
Thank you for link anyway. I found _MM_EXTRACT_FLOAT as official solution, that's pretty interesting and fun. For me it looks like bad design. Still wonder to know the reason for this solution.
I don't think that using PORT5 is a good idea anyway. Maybe shift solution is more simple and faster for CPU to perform:
template<int i> [[nodiscard]] float __vectorcall _mm_get_ps(__m128 v) { return _mm_cvtss_f32(_mm_castsi128_ps(_mm_srli_si128(_mm_castps_si128(x), i * 4))); }

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page