Software Archive
Read-only legacy content
17060 Discussions

Report a bug found in micvec.h

Kaixi_H_
Beginner
289 Views

While using the shuffle operations in micvec.h (provided by Intel Composer 13.1), I found the template function of shuffle() cannot be correctly compiled. The contents are as below:

template <_MM_PERM_ENUM p128, _MM_PERM_ENUM p32>                                                                                                 
F32vec16 shuffle()                           
{             
return (F32vec16)_mm512_mask_permute4f128_epi32(_mm512_shuffle_epi32(vec, p32), p128); 
}   

Obviously, they mistakenly use the _mm512_mask_permute4f128_epi32 (needs 4 arguments) as the _mm512_permute4f128_epi32 (needs 2 arguments). I am not sure this would have been fixed in the new version. 

0 Kudos
1 Solution
Kevin_D_Intel
Employee
289 Views

Development confirmed this is a defect (see internal tracking id below) and will fix this in a future release. I will update your post on the status of the fix as I learn it. Thank you for reporting this.

(Internal tracking id: DPD200256178)

View solution in original post

0 Kudos
3 Replies
Kevin_D_Intel
Employee
289 Views

Interesting note. Nothing has changed in the current (14.0 compiler) or coming release regarding this and I see a couple of other instances of similar usage. I will inquire w/our intrinsic developer for an explanation.

0 Kudos
Kevin_D_Intel
Employee
290 Views

Development confirmed this is a defect (see internal tracking id below) and will fix this in a future release. I will update your post on the status of the fix as I learn it. Thank you for reporting this.

(Internal tracking id: DPD200256178)

0 Kudos
Kevin_D_Intel
Employee
289 Views

Development is targeting fixing this in the next update to Composer XE 2013 SP1 tentatively in the August time-frame. I will keep you updated on the fix as the availability becomes clearer.

They also offered a workaround of not using shuffle() methods directly. Instead, indicating a set of intrinsics can be used.

For example, instead of using:

   v = v.shuffle<_MM_PERM_BBCC,_MM_PERM_ABCD>();

the user can define the correct intrinsic sequence for doing the shuffle, like this:

#define F32_SHUF128x32(v, perm128, perm32) \
    _mm512_castsi512_ps(_mm512_permute4f128_epi32( \
       _mm512_shuffle_epi32(_mm512_castps_si512((v)), perm32), \
       perm128 \
    ))

and then use it as follows:

   v = F32_SHUF128x32(v, _MM_PERM_BBCC,_MM_PERM_ABCD);

 

0 Kudos
Reply