- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
While using the shuffle operations in micvec.h (provided by Intel Composer 13.1), I found the template function of shuffle() cannot be correctly compiled. The contents are as below:
template <_MM_PERM_ENUM p128, _MM_PERM_ENUM p32> F32vec16 shuffle() { return (F32vec16)_mm512_mask_permute4f128_epi32(_mm512_shuffle_epi32(vec, p32), p128); }
Obviously, they mistakenly use the _mm512_mask_permute4f128_epi32 (needs 4 arguments) as the _mm512_permute4f128_epi32 (needs 2 arguments). I am not sure this would have been fixed in the new version.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Development confirmed this is a defect (see internal tracking id below) and will fix this in a future release. I will update your post on the status of the fix as I learn it. Thank you for reporting this.
(Internal tracking id: DPD200256178)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Interesting note. Nothing has changed in the current (14.0 compiler) or coming release regarding this and I see a couple of other instances of similar usage. I will inquire w/our intrinsic developer for an explanation.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Development confirmed this is a defect (see internal tracking id below) and will fix this in a future release. I will update your post on the status of the fix as I learn it. Thank you for reporting this.
(Internal tracking id: DPD200256178)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Development is targeting fixing this in the next update to Composer XE 2013 SP1 tentatively in the August time-frame. I will keep you updated on the fix as the availability becomes clearer.
They also offered a workaround of not using shuffle() methods directly. Instead, indicating a set of intrinsics can be used.
For example, instead of using:
v = v.shuffle<_MM_PERM_BBCC,_MM_PERM_ABCD>();
the user can define the correct intrinsic sequence for doing the shuffle, like this:
#define F32_SHUF128x32(v, perm128, perm32) \ _mm512_castsi512_ps(_mm512_permute4f128_epi32( \ _mm512_shuffle_epi32(_mm512_castps_si512((v)), perm32), \ perm128 \ ))
and then use it as follows:
v = F32_SHUF128x32(v, _MM_PERM_BBCC,_MM_PERM_ABCD);
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page