Bugs in Intrinsics Guide - 第3页

andysem · ‎01-30-2013

Hi,

I've found a few bugs in the Intel Intrinsics Guide 2.7 (I'm using Linux version):

1. When the window is maximized, the search field is stretched vertically while still being a one-line edit box. It sould probably be sized accordingly.

2. __m256 _mm256_undefined_si256 () should return __m256i.

3. In some instructions description, like _mm_adds_epi8, the operation is described in terms of SignedSaturate while, e.g. _mm256_adds_epi16 is described with SaturateToSignedWord. This applies to other operations with unsigned saturation as well. Also, the vector elements are described differently. More consistent description would be nice.

4. _mm_alignr_epi8 has two descriptions.

5. I'm not sure _mm_ceil_pd signature and description is correct. It says the intrinsic returns a vector of single-precision floats. Shouldn't it be double-precision?

I didn't read all instructions so there may be more issues. I'll post if I find anything else.

PS: This is not a bug per se but some instructions are missing the Latency & Throughput information. This mostly relates to newer instructions but still this info is useful and I hope it will be added.

Patrick_K_Intel · ‎12-02-2013

Yes, you are correct, this will be resolved in the next release, which will be later this month.

andysem · ‎12-19-2013

I see there is a major update to the Intrinsics Guide. Nice job, thanks!

There is an error in the tooltip that pops up when I hover the pointer over the non-VEX instructions. Regardless of the instruction, the tooltip always says:

This intrinsic may generate the VEX-encoded instruction vpunpcklwd. If the instruction is not VEX encoded, punpcklwd may cause performance penalties if mixed with 256-bit or 512-bit instructions.

I suppose, the text should either be more generic or mention the corresponding instructions.

BTW: Is there a downloadable (standalone) version of the guide?

Patrick_K_Intel · ‎12-19-2013

That is indeed an issue. The fix is on its way up right now, should be visible soon.

The standalone version is not available at the moment, but hopefully we'll have it ready early next year.

bronxzv · ‎01-04-2014

I just remarked a few errors with the Haswell throughput for these instructions :

VBLENDVPS/PD : should be 2 instead of 1

VMULPD/PS : should be 0.5 instead of 1

andysem · ‎01-08-2014

When I select AVX-512F in the filters, the SVML intrinsics are also listed. This doesn't happen when I select AVX-512 though.

andysem · ‎01-27-2014

Some intrinsics are missing timing information that was present in the 3.0.1 (the last standalone) version. For example, _mm_alignr_epi8 and _mm_alignr_pi8.

Are there any news on the standalone version?

Patrick_K_Intel · ‎01-28-2014

Regarding the VBLENDVPD/PS and VMULPD/PS throughputs on Haswell, you're correct, those will be updated momentarily.

Regarding SVML intrinsics showing under AVX-512F, that was resolved a while ago, but may not have been universally visible. It should be visible soon.

Regarding missing performance data for _mm_alignr_epi8 and _mm_alignr_pi8, _mm_alignr_epi8 was indeed a mistake and will be added shortly. In the process of validating all the intrinsics performance data across multiple sources, some of the data was identified as invalid and thus removed, which was the case for _mm_alignr_pi8.

Diego_Caballero · ‎02-08-2014

Hi,

_mm512_permutevar_epi32 / _mm512_mask_permutevar_epi32 and _mm512_alignr_epi32 are missing for KNC in the last version.

Is there any plan to include latency and throughput for KNC ISA?

Thank you!

Kevin_D_Intel · ‎03-03-2014

User Patrick S. wrote about other mistakes here: http://software.intel.com/en-us/forums/topic/500971#comment-1779043. He wrote:

Patrick S. wrote:
I have also found some mistakes:

in the intrinsics guide:

http://software.intel.com/sites/landingpage/IntrinsicsGuide/

the instruction _mm512_alignr_epi32 is not listed under "KNC". It is only listed under AVX-512, but KNC supports the alignr instruction.

The same for:

_mm512_mask_alignr_epi32/epi64
_mm512_load_ps/pd
_mm512_store_ps/pd
_mm512_fmadd_ps/pd
_mm512_fnmadd_ps/pd
_mm512_fmsub_ps/pd
_mm512_fnmsub_ps/pd

also all cast instructions like _mm512_castpd_ps are not listed under "KNC".

I guess that there a lot more mistakes, but these are the ones I remember.

Diego_Caballero · ‎03-04-2014

Hi,

I think there is an error in the description of the algorithm of the intrinsics '_mm512_*_extpackstorelo_*' (or maybe I'm missing something):

The condition

IF (storeAddr % 64) == 0 BREAK

should be something like

IF ((addr + storeOffset * downSize) % 64) == 0 BREAK

Otherwise, the first aligned element (hi) will be written by the 'lo' intrinsic and it shouldn't according to my understanding.

Please, let me know if I'm wrong.

Thanks.

Patrick_K_Intel · ‎03-04-2014

There appear to be a number of issues with KNC intrinsics, including several missing intrinsics (specifically when the name matches an AVX-512 intrinsic), and intrinsics that should be cross listed as both AVX-512 and KNC but are only listed under AVX-512. I am in the process of reviewing all KNC intrinsics and will release an update that should resolve all these issues shortly.

Patrick_S_ · ‎03-04-2014

The function _mm512_fmadd233_epi32 is listed in the Intrinsics guide as a = b*c. I guess that is also a typo.

btw I really like the Intrinsics guide! Would it be possible that you add a button for choosing the data type (integer, floating point)? Like in the software "Intel Intrinsics Guide - v.3.01.?

Another idea for improvement would be to add a "advanced search", e.g. search for function with a special output data type (int, double and so forth). That search option would have saved me a lot of time.

Patrick_K_Intel · ‎03-19-2014

I've just updated the Intrinsics Guide (v3.1.5). This should resolve all the KNC issues, as well as the issue with fmadd233 and extpackstorelo.

http://software.intel.com/sites/landingpage/IntrinsicsGuide/

andysem · ‎03-19-2014

_mm_sub_epi16 intrinsic is documented to correspond to phsubw instruction, while it should be psubw. The timing data is also given for phsubw instead of psubw.

Vladimir_Sedach · ‎03-20-2014

No compiler version info. For example, _mm_erfcinv_ps appeared in ICC 14.

Patrick_K_Intel · ‎03-20-2014

I've resolved the issue with _mm_sub_epi16, the update should appear soon. I've also added the new intrinsics for xsavec, xsaves, and xrstors.

Stefan_M_Intel · ‎04-02-2014

Great tool, some shortcomings

_mm_xor_si128() says "bitwisw OR"
All commands with "abs" may add information about behaviour for the value -2^(N-1) with N being bitwidth of corresponding epi type

Stefan_M_Intel · ‎04-03-2014

Hello,

I currently use data version 3.1.6 very actively and had trouble with compiling the four intrinsics *_bslli_si128() and *_bsrli_si128(). With gcc, they only compile when I remove the b. I do not (yet) use Intel compiler, but the SW developer manual also lists those four intrinsics without b.

Intel C/C++ Compiler Intrinsic Equivalent

(V)PSLLDQ: __m128i _mm_slli_si128 ( __m128i a, int imm)

VPSLLDQ: __m256i _mm256_slli_si256 ( __m256i a, const int imm)

Intel C/C++ Compiler Intrinsic Equivalents

(V)PSRLDQ: __m128i _mm_srli_si128 ( __m128i a, int imm)

VPSRLDQ: __m256i _mm256_srli_si256 ( __m256i a, const int imm)

andysem · ‎04-08-2014

Please, specify that _mm_madd_epi16 and _mm256_madd_epi16 perform signed multiplication.

Patrick_K_Intel · ‎04-09-2014

Stefan M. wrote:

Hello,

I currently use data version 3.1.6 very actively and had trouble with compiling the four intrinsics *_bslli_si128() and *_bsrli_si128(). With gcc, they only compile when I remove the b. I do not (yet) use Intel compiler, but the SW developer manual also lists those four intrinsics without b.

Intel C/C++ Compiler Intrinsic Equivalent

(V)PSLLDQ: __m128i _mm_slli_si128 ( __m128i a, int imm)

VPSLLDQ: __m256i _mm256_slli_si256 ( __m256i a, const int imm)

Intel C/C++ Compiler Intrinsic Equivalents

(V)PSRLDQ: __m128i _mm_srli_si128 ( __m128i a, int imm)

VPSRLDQ: __m256i _mm256_srli_si256 ( __m256i a, const int imm)

You can use either name, they perform the same functionality, although the "b" names may not be supported by GCC at this point.

Eugen_V_ · ‎04-14-2014

IntrinsicsGuide not working

Broken link to https://software.intel.com/en-us/sites/landingpage/IntrinsicsGuide

but https://software.intel.com/sites/landingpage/IntrinsicsGuide/ opened but say "Error Loading Data"

in debug i am find out that "https://software.intel.com/sites/landingpage/IntrinsicsGuide/files/data-3.1.6.xml" is not accessible

but https://software.intel.com/sites/landingpage/IntrinsicsGuide/files/data-3.1.6.xml work