Hi,
I've found a few bugs in the Intel Intrinsics Guide 2.7 (I'm using Linux version):
1. When the window is maximized, the search field is stretched vertically while still being a one-line edit box. It sould probably be sized accordingly.
2. __m256 _mm256_undefined_si256 () should return __m256i.
3. In some instructions description, like _mm_adds_epi8, the operation is described in terms of SignedSaturate while, e.g. _mm256_adds_epi16 is described with SaturateToSignedWord. This applies to other operations with unsigned saturation as well. Also, the vector elements are described differently. More consistent description would be nice.
4. _mm_alignr_epi8 has two descriptions.
5. I'm not sure _mm_ceil_pd signature and description is correct. It says the intrinsic returns a vector of single-precision floats. Shouldn't it be double-precision?
I didn't read all instructions so there may be more issues. I'll post if I find anything else.
PS: This is not a bug per se but some instructions are missing the Latency & Throughput information. This mostly relates to newer instructions but still this info is useful and I hope it will be added.
链接已复制
I see there is a major update to the Intrinsics Guide. Nice job, thanks!
There is an error in the tooltip that pops up when I hover the pointer over the non-VEX instructions. Regardless of the instruction, the tooltip always says:
This intrinsic may generate the VEX-encoded instruction vpunpcklwd. If the instruction is not VEX encoded, punpcklwd may cause performance penalties if mixed with 256-bit or 512-bit instructions.
I suppose, the text should either be more generic or mention the corresponding instructions.
BTW: Is there a downloadable (standalone) version of the guide?
Some intrinsics are missing timing information that was present in the 3.0.1 (the last standalone) version. For example, _mm_alignr_epi8 and _mm_alignr_pi8.
Are there any news on the standalone version?
Regarding the VBLENDVPD/PS and VMULPD/PS throughputs on Haswell, you're correct, those will be updated momentarily.
Regarding SVML intrinsics showing under AVX-512F, that was resolved a while ago, but may not have been universally visible. It should be visible soon.
Regarding missing performance data for _mm_alignr_epi8 and _mm_alignr_pi8, _mm_alignr_epi8 was indeed a mistake and will be added shortly. In the process of validating all the intrinsics performance data across multiple sources, some of the data was identified as invalid and thus removed, which was the case for _mm_alignr_pi8.
Hi,
_mm512_permutevar_epi32 / _mm512_mask_permutevar_epi32 and _mm512_alignr_epi32 are missing for KNC in the last version.
Is there any plan to include latency and throughput for KNC ISA?
Thank you!
User Patrick S. wrote about other mistakes here: http://software.intel.com/en-us/forums/topic/500971#comment-1779043. He wrote:
Patrick S. wrote:
I have also found some mistakes:in the intrinsics guide:
http://software.intel.com/sites/landingpage/IntrinsicsGuide/
the instruction _mm512_alignr_epi32 is not listed under "KNC". It is only listed under AVX-512, but KNC supports the alignr instruction.
The same for:_mm512_mask_alignr_epi32/epi64
_mm512_load_ps/pd
_mm512_store_ps/pd
_mm512_fmadd_ps/pd
_mm512_fnmadd_ps/pd
_mm512_fmsub_ps/pd
_mm512_fnmsub_ps/pdalso all cast instructions like _mm512_castpd_ps are not listed under "KNC".
I guess that there a lot more mistakes, but these are the ones I remember.
Hi,
I think there is an error in the description of the algorithm of the intrinsics '_mm512_*_extpackstorelo_*' (or maybe I'm missing something):
The condition
IF (storeAddr % 64) == 0 BREAK
should be something like
IF ((addr + storeOffset * downSize) % 64) == 0 BREAK
Otherwise, the first aligned element (hi) will be written by the 'lo' intrinsic and it shouldn't according to my understanding.
Please, let me know if I'm wrong.
Thanks.
There appear to be a number of issues with KNC intrinsics, including several missing intrinsics (specifically when the name matches an AVX-512 intrinsic), and intrinsics that should be cross listed as both AVX-512 and KNC but are only listed under AVX-512. I am in the process of reviewing all KNC intrinsics and will release an update that should resolve all these issues shortly.
The function _mm512_fmadd233_epi32 is listed in the Intrinsics guide as a = b*c. I guess that is also a typo.
btw I really like the Intrinsics guide! Would it be possible that you add a button for choosing the data type (integer, floating point)? Like in the software "Intel Intrinsics Guide - v.3.01.?
Another idea for improvement would be to add a "advanced search", e.g. search for function with a special output data type (int, double and so forth). That search option would have saved me a lot of time.
I've just updated the Intrinsics Guide (v3.1.5). This should resolve all the KNC issues, as well as the issue with fmadd233 and extpackstorelo.
http://software.intel.com/sites/landingpage/IntrinsicsGuide/
Hello,
I currently use data version 3.1.6 very actively and had trouble with compiling the four intrinsics *_bslli_si128() and *_bsrli_si128(). With gcc, they only compile when I remove the b. I do not (yet) use Intel compiler, but the SW developer manual also lists those four intrinsics without b.
Intel C/C++ Compiler Intrinsic Equivalent
(V)PSLLDQ: __m128i _mm_slli_si128 ( __m128i a, int imm)
VPSLLDQ: __m256i _mm256_slli_si256 ( __m256i a, const int imm)
Intel C/C++ Compiler Intrinsic Equivalents
(V)PSRLDQ: __m128i _mm_srli_si128 ( __m128i a, int imm)
VPSRLDQ: __m256i _mm256_srli_si256 ( __m256i a, const int imm)
Stefan M. wrote:
Hello,
I currently use data version 3.1.6 very actively and had trouble with compiling the four intrinsics *_bslli_si128() and *_bsrli_si128(). With gcc, they only compile when I remove the b. I do not (yet) use Intel compiler, but the SW developer manual also lists those four intrinsics without b.
Intel C/C++ Compiler Intrinsic Equivalent
(V)PSLLDQ: __m128i _mm_slli_si128 ( __m128i a, int imm)
VPSLLDQ: __m256i _mm256_slli_si256 ( __m256i a, const int imm)
Intel C/C++ Compiler Intrinsic Equivalents
(V)PSRLDQ: __m128i _mm_srli_si128 ( __m128i a, int imm)
VPSRLDQ: __m256i _mm256_srli_si256 ( __m256i a, const int imm)
You can use either name, they perform the same functionality, although the "b" names may not be supported by GCC at this point.
IntrinsicsGuide not working
Broken link to https://software.intel.com/en-us/sites/landingpage/IntrinsicsGuide
but https://software.intel.com/sites/landingpage/IntrinsicsGuide/ opened but say "Error Loading Data"
in debug i am find out that "https://software.intel.com/sites/landingpage/IntrinsicsGuide/files/data-3.1.6.xml" is not accessible
but https://software.intel.com/sites/landingpage/IntrinsicsGuide/files/data-3.1.6.xml work