Hi, - Page 9

andysem · ‎01-30-2013

Hi,

I've found a few bugs in the Intel Intrinsics Guide 2.7 (I'm using Linux version):

1. When the window is maximized, the search field is stretched vertically while still being a one-line edit box. It sould probably be sized accordingly.

2. __m256 _mm256_undefined_si256 () should return __m256i.

3. In some instructions description, like _mm_adds_epi8, the operation is described in terms of SignedSaturate while, e.g. _mm256_adds_epi16 is described with SaturateToSignedWord. This applies to other operations with unsigned saturation as well. Also, the vector elements are described differently. More consistent description would be nice.

4. _mm_alignr_epi8 has two descriptions.

5. I'm not sure _mm_ceil_pd signature and description is correct. It says the intrinsic returns a vector of single-precision floats. Shouldn't it be double-precision?

I didn't read all instructions so there may be more issues. I'll post if I find anything else.

PS: This is not a bug per se but some instructions are missing the Latency & Throughput information. This mostly relates to newer instructions but still this info is useful and I hope it will be added.

firelight · ‎08-17-2018

Shift clarification:

When describing shift operations, it would help clarify the shift amount by using the word "byte" or "bit" like in the documentation https://software.intel.com/en-us/node/524238.

For example:

__m128i _mm_srli_epi32 (__m128i a, int imm8)

...

Shift packed 32-bit integers in a right by imm8 bytes while shifting in zeros, and store the results in dst.

Thank you and good day.

andysem · ‎08-21-2018

Groarke, Philippe wrote:

__m128i _mm_srli_epi32 (__m128i a, int imm8)

...

Shift packed 32-bit integers in a right by imm8 bytes while shifting in zeros, and store the results in dst.

_mm_srli_epi32 operates on bits. By default, shift and rotate operations work in terms of bits. _mm_srli_si128/_mm_slli_si128 and equivalents for larger vectors are exceptions.

Gabriel_M_1 · ‎09-05-2018

Wrong return type in the `rdtsc` intrinsic

I've found an issue with the description of the `rdtsc` intrinsic.

This instruction is used to read the value of the CPU's timestamp counter, a 64-bit monotonically-increasing unsigned integer. However, the docs say the return type is a signed `__int64`. Both GCC and Clang properly expose this intrinsic as returning an unsigned long long, not a signed one. I believe this should be fixed.

Eden_S_Intel · ‎09-10-2018

Hi,

The AVX512-VNNI instructions vpdpbusd doesn't show in the guide as far as i see.

Kearney__Jim · ‎10-19-2018

The Guide shows various permute*epi8 intrinsics as available in "AVX512_VBMI + AVX512VL", but aren't they only in VBMI?

andysem · ‎10-22-2018

Kearney, Jim wrote:

The Guide shows various permute*epi8 intrinsics as available in "AVX512_VBMI + AVX512VL", but aren't they only in VBMI?

Instructions that operate on 128 or 256-bit vectors require AVX-512VL in addition to AVX-512VBMI. 512-bit vectors only require AVX-512VBMI.

Kearney__Jim · ‎10-22-2018

andysem wrote:

Quote:

Kearney, Jim wrote:

The Guide shows various permute*epi8 intrinsics as available in "AVX512_VBMI + AVX512VL", but aren't they only in VBMI?

Instructions that operate on 128 or 256-bit vectors require AVX-512VL in addition to AVX-512VBMI. 512-bit vectors only require AVX-512VBMI.

Sorry, I phrased that poorly.

I meant, the presentation in the Guide acts as though "+" means "or". If one enables only AVX-512VL under Technologies, the _mm_ and _mm256_ variants are shown as available. I would expect to have to check AVX-512_VBMI as well ("and").

Armstrong__Brian · ‎10-23-2018

Hi,

Does _mm_loadl_epi64 (movq xmm, m64) have alignment requirements? The documentation seems to suggest that _mm_load_si128 does have a 16-byte alignment requirement, while _mm_loadu_si128 does not have any requirements. It doesn't mention the alignment requirements for _mm_loadl_epi64 as far as I can tell.

Thanks!

andysem · ‎10-24-2018

Armstrong, Brian wrote:

Does _mm_loadl_epi64 (movq xmm, m64) have alignment requirements?

SDM Volume 1, Section 4.1.1 Alignment of Words, Doublewords, Quadwords, and Double Quadwords summarizes memory alignment requirements for most instructions. In particular, it says that words, double words and quadwords need not be aligned.

Jethro_B_ · ‎12-16-2018

I don't think the currently-defined intrinsics for SGX (_encls_u32, _enclu_u32, _enclv_u32) are appropriate. The leafs are so different and should really be considered different instructions. No other instructions that have defined intrinsics have this issue.

Different leafs do completely different things
Different leafs have completely different operands
Different leafs may have different hardware support (in terms of CPU features)

The operation is of course documented in the SDM, but I've summarized all the leafs here https://github.com/fortanix/rust-sgx/issues/15#issuecomment-447738899

andysem · ‎02-01-2019

For _mm256_testc_si256 and other similar intrinsics, the Intrinsics Guide lists the corresponding instruction as vptest and CPUID flags as AVX. vptest is an AVX2 instruction. I'm assuming, for AVX targets the compiler should translate the intrinsic to either vtestps or vtestpd instruction, but those instructions are not listed.

andysem · ‎03-04-2019

For _mm_cvtpd_ps it would be useful to say in the description and pseudo-code that the upper half of the resulting register is filled with zero. For _mm_cvtps_pd it would be nice to mention in the description that it converts only the lower 2 elements of the vector.

There's been a while since the last update of Intrinsics Guide, when will be a new update?

James_C_Intel2 · ‎04-05-2019

The description of tpause has a few problems

the asm syntax is weird, with spurious commas
there is no statement about what the result of the intrinsic is. The implementation shows that it is rflags.cf, but you should say that explicitly.

Patrick_K_Intel · ‎04-05-2019

That's good feedback, thank you James, I'll update accordingly.

andysem, I try to batch several updates together and coordinate with any new intrinsics being announced, I'm still looking for the right date to do a release.

Gabriel_M_1 · ‎04-19-2019

The _rdtsc intrinsic has a return type of __int64 (signed integer), which is incorrect, since the time is is read from the CPU's timestamp counter, an unsigned integer.

Existing C compilers (such as gcc) already return an unsigned long long type.

This is an issue for Rust developers when implementing Intel's intrinsics, because we use the intrinsic guide as a reference for the return type and parameters of the intrinsics.

Stefan_M_Intel1 · ‎05-29-2019

Intrinsics Guide Data Version: 3.4.4

Looks as if for the two non-masked operations _mm512_popcnt_epi8() and _mm512_popcnt_epi16(), the POPCNT() operator is missing in the pseudo-code.

Stefan_M_Intel1 · ‎05-29-2019

Re-wording proposal for _mm512_bitshuffle_epi64_mask()

Per 64-bit element in b and its 8 associated 8-bit elements in c: Gather 8 bits from 64-bit element in b at bit positions controlled by the 8 8-bit elements of c, and store the result in the associated Byte of mask k. There are 8 such operations done in parallel per 64-bit lane, each producing one Byte in k.

Stefan_M_Intel1 · ‎05-29-2019

Pseudo code of all _mm512_dpbusd_epi32() looks incorrect.

The for-loop is defined with j, but operator[] is using expressions in i.

I suspect that the expressions shall read

tmp2 := a.byte[4*j+1] * b.byte[4*j+1] // i.e. all in j and also for b it shall be 4*j

Stefan_M_Intel1 · ‎05-29-2019

For the two non-masked operations _mm512_popcnt_epi32() and _mm512_popcnt_epi64(), the POPCNT() operator is missing in the pseudo-code.

Akhundzhanov__Alexan · ‎08-03-2019

Hi, I found a mistake in Operation description for every addsub, fmaddsub, subadd, fmsubadd instructions.

The IF-statement inside loop looks like IF (j % 1 == 0) but it is IF (j % 2 == 0).

Lopez__Emilio · ‎08-04-2019

Hi,

There is a small typo in _mm256_hsub_ps as it says "Horizontally add adjacent pairs [...]" where really it should say subtract

Thanks!

Bugs in Intrinsics Guide