description for __m128i _mm_sad_epu8 (m128i a, m128i b) is not correct,

andysem · ‎01-30-2013

Hi,

I've found a few bugs in the Intel Intrinsics Guide 2.7 (I'm using Linux version):

1. When the window is maximized, the search field is stretched vertically while still being a one-line edit box. It sould probably be sized accordingly.

2. __m256 _mm256_undefined_si256 () should return __m256i.

3. In some instructions description, like _mm_adds_epi8, the operation is described in terms of SignedSaturate while, e.g. _mm256_adds_epi16 is described with SaturateToSignedWord. This applies to other operations with unsigned saturation as well. Also, the vector elements are described differently. More consistent description would be nice.

4. _mm_alignr_epi8 has two descriptions.

5. I'm not sure _mm_ceil_pd signature and description is correct. It says the intrinsic returns a vector of single-precision floats. Shouldn't it be double-precision?

I didn't read all instructions so there may be more issues. I'll post if I find anything else.

PS: This is not a bug per se but some instructions are missing the Latency & Throughput information. This mostly relates to newer instructions but still this info is useful and I hope it will be added.

andysem · ‎04-14-2014

http://software.intel.com/sites/landingpage/IntrinsicsGuide/ works for me now. That said there were site outages a few days ago (the forum was completely inaccessible for a day or two for me), maybe the problems are still happening from time to time.

Bernard · ‎04-15-2014

andysem wrote:

http://software.intel.com/sites/landingpage/IntrinsicsGuide/ works for me now. That said there were site outages a few days ago (the forum was completely inaccessible for a day or two for me), maybe the problems are still happening from time to time.

Works for me also.

Patrick_K_Intel · ‎04-16-2014

Sorry about that, there were some server changes that caused some intermittent issues, but it should be working fine now.

Xiong_Z_ · ‎06-12-2014

description for __m128i _mm_sad_epu8 (m128i a, m128i b) is not correct,

Description

Compute the absolute differences of packed unsigned 8-bit integers in a and b, then horizontally sum each consecutive 8 differences to produce ~~four unsigned~~ two unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in dst.

Kevin_D_Intel · ‎06-16-2014

User Jeremias M. wrote here: https://software.intel.com/en-us/forums/topic/516476#comment-1791398 regarding an issue filtering results only for KNC and the search returning _mm512_mask_set1_epi32 as a valid intrinsic for KNC. That is not currently incorrect. It may become true in a future release as discussed in the cited thread.

Patrick_K_Intel · ‎06-17-2014

Thank you for your feedback. I've updated the Intrinsics Guide to resolve the issues with _mm_sad_epu8 and _mm512_mask_set1_epi32, as well as a few other issues with KNC intrinsics.

https://software.intel.com/sites/landingpage/IntrinsicsGuide/

Jeremias_M_ · ‎06-18-2014

Hi,

I was using the function _mm512_mask_reduce_gmax_pd and when I checked for the int same functions in the guide, appeared only for AVX-512 instructions.

So, I checked in zmmintrin.h header and I saw the functions implemented. Then I tested some functions( _mm512_mask_reduce_max_epi32 (__mmask16 k, __m512i a), _mm512_reduce_max_epi32 (__m512i a) ), and they worked.

I believe that it's possible the below functions were made for KNC too.

int _mm512_reduce_max_epi32 (__m512i a)

__int64 _mm512_reduce_max_epi64 (__m512i a)

unsigned int _mm512_reduce_max_epu32 (__m512i a)

unsigned __int64 _mm512_reduce_max_epu64 (__m512i a)

double _mm512_reduce_max_pd (__m512d a)

float _mm512_reduce_max_ps (__m512 a)

Patrick_K_Intel · ‎06-20-2014

You are correct, all the _reduce_ intrinsics are supported on KNC. I've updated the Intrinsics Guide to resolve this issue.

andysem · ‎07-07-2014

_mm_test_all_ones intrinsic has multiple different timing values for the same CPUs.

andysem · ‎10-05-2014

The Intel intrinsics guide page doesn't load for me or loads really slow (about a minute or so). It shows the intrinsics categories on the left and "Loading" in the center and hangs this way. I'm using Firefox 32.0.3 on Linux.

On a related note, will there be an offline standalone release? Browser version is not always convenient for me.

Dobratz__Glenn · ‎10-30-2014

I find the opening screen of the guide to be very unreadable. It would be much more readable if only the function name were used at the top level instead of the full function prototypes. Using the prototypes just creates a lot of visual noise that obscures the function names. Since the prototype is easily visible when a function is displayed, IMHO, the extra click needed to see the prototype is outweighed by the improved readability.

Dobratz__Glenn · ‎10-30-2014

It would be helpful if the description of the intrinsics also had a link to the corresponding instruction's description in the Intel Processor Instruction Set manual, so we can easily get the dirty details on the generated instruction.

andysem · ‎10-31-2014

Glenn D. wrote:

It would be much more readable if only the function name were used at the top level instead of the full function prototypes.

I disagree. The prototype is useful for me because I often don't remember the exact signature or arguments of the intrinsic, and all I have to do is just type it in the search field.

andysem · ‎11-03-2014

andysem wrote:

Please, specify that _mm_madd_epi16 and _mm256_madd_epi16 perform signed multiplication.

Was this forgotten? This information is still missing in 3.3.1.

Patrick_K_Intel · ‎11-03-2014

andysem wrote:

Was this forgotten? This information is still missing in 3.3.1.

I guess so, I'll be sure to include this in the next update.

Yukimasa__Sugizaki · ‎12-12-2014

Hi.

There are invalid names of constants in Operations in _mm512_{,mask_}extload_*.
(according to zmmintrin.h)

_MM_BROADCAST1X16 should be _MM_BROADCAST_1X16.
_MM_BROADCAST4X16 should be _MM_BROADCAST_4X16.
_MM_BROADCAST1X8 should be _MM_BROADCAST_1X8.
_MM_BROADCAST4X8 should be _MM_BROADCAST_4X8.

Regards,
Sugizaki.

andysem · ‎12-17-2014

Please, mention in the description that _mm_maskmoveu_si128 and _mm_maskmove_si64 generate non-temporal memory stores.

Patrick_K_Intel · ‎12-18-2014

Thanks guys, I've made these corrections.

bronxzv · ‎01-01-2015

there is a series of errors in the Intrinsics Guide for the description of intrinsics mapping to instructions with an immediate operand

operands of the imm8 type (8-bit) are declared as int (32-bit) intrinsic arguments so I'll advise to always use a notation such as imm[7:0] in the Intrinsics Guide

for example the description of _mm256_blend_epi16 at the moment makes some users think that they can use a 16-bit mask

(see https://software.intel.com/en-us/forums/topic/537849)

Patrick_K_Intel · ‎01-12-2015

Thanks for reporting this issue. I have updated the documentation around immediate parameters to clarify this better.

bronxzv · ‎01-13-2015

Patrick Konsor (Intel) wrote:
I have updated the documentation around immediate parameters to clarify this better.

the desciption for _mm256_blend_epi16 looks the same as before in the online Intrinsics Guide, I suppose that your changes aren't yet published, right ?

Bugs in Intrinsics Guide

description for __m128i _mm_sad_epu8 (__m128i a, __m128i b) is not correct,

description for __m128i _mm_sad_epu8 (m128i a, m128i b) is not correct,