Intel® ISA Extensions
Use hardware-based isolation and memory encryption to provide more code protection in your solutions.
공지
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Bugs in Intrinsics Guide

andysem
새로운 기여자 III
64,084 조회수

Hi,

I've found a few bugs in the Intel Intrinsics Guide 2.7 (I'm using Linux version):

1. When the window is maximized, the search field is stretched vertically while still being a one-line edit box. It sould probably be sized accordingly.

2. __m256 _mm256_undefined_si256 () should return __m256i.

3. In some instructions description, like _mm_adds_epi8, the operation is described in terms of SignedSaturate while, e.g. _mm256_adds_epi16 is described with SaturateToSignedWord. This applies to other operations with unsigned saturation as well. Also, the vector elements are described differently. More consistent description would be nice.

4. _mm_alignr_epi8 has two descriptions.

5. I'm not sure _mm_ceil_pd signature and description is correct. It says the intrinsic returns a vector of single-precision floats. Shouldn't it be double-precision?

I didn't read all instructions so there may be more issues. I'll post if I find anything else.

PS: This is not a bug per se but some instructions are missing the Latency & Throughput information. This mostly relates to newer instructions but still this info is useful and I hope it will be added.

0 포인트
221 응답
Patrick_K_Intel
4,420 조회수

The latency and throughput data is for instructions, not intrinsics, so if an instruction exists on an earlier architecture then that data will be shown for all intrinsics that share that instruction. I'll look into improving this.

06_45/46 are additional Haswell models.

0 포인트
andysem
새로운 기여자 III
4,420 조회수

But the latency/throughput can be different depending on the instruction argument width, isn't it? And the intrinsics operate on zmm registers which are not available on Haswell and earlier architectures. BTW, latency/throughput data for AVX-512 instructions is presented for ymm operands, not zmm.

0 포인트
Patrick_K_Intel
4,420 조회수

We do not include latency/throughput data for unreleased micro-architectures, so none of the latency/throughput data is currently applicable for AVX-512 intrinsics (or anything newer than AVX2/FMA). In the future we will add this data, and it will be marked with zmm (where appropriate) operands. Any latency/throughput data shown for an AVX-512 intrinsic is referring to the 128-bit or 256-bit version of the instruction that corresponds to that intrinsic. I will look into limiting the latency/through data that is shown to supported architectures.

0 포인트
SergeyKostrov
소중한 기여자 II
4,420 조회수
>>... In the future we will add this data, and it will be marked with zmm (where appropriate) operands. Any latency/throughput >>data shown for an AVX-512 intrinsic... Please also inform Intel C++ compiler team to update a comment in zmmintrin.h header file about '...512-bit compiler intrinsics...'. It doesn't have a prefix AVX as you can see and ideally it should look like: '...AVX 512-bit compiler intrinsics...'
0 포인트
levicki
소중한 기여자 I
4,420 조회수

What about latency and throughput for intrinsics that map to more than one instruction at compiler's discretion (depending on /arch switch)?

0 포인트
Patrick_K_Intel
4,420 조회수

Well that would depend on the specific instructions the compiler chose. You can look up the latency/throughput of specific instructions in the Optimization Manual: http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html

0 포인트
andysem
새로운 기여자 III
4,420 조회수

Another bug: monitor/mwait instructions are said to be detectable with SSE3 cpuid flag. This is not correct, there is a dedicated flag (ecx bit 3 for cpuid function 1) for these instructions.

0 포인트
Patrick_K_Intel
4,420 조회수

You are correct. This will be corrected in the next release.

0 포인트
SergeyKostrov
소중한 기여자 II
4,420 조회수
Patrick, I wonder if a weblink to a latest version of Intel Intrinsic Guide could be provided, for example, in some Sticky Post? Thanks in advance.
0 포인트
Patrick_K_Intel
4,420 조회수

The latest version is always available here:

http://software.intel.com/en-us/articles/intel-intrinsics-guide

0 포인트
Filippo_Bistaffa
초급자
4,420 조회수

Am I missing something or "_mm_bsrli_si128" and "_mm_srli_si128" have the same description? What's the difference between them?

0 포인트
Patrick_K_Intel
4,420 조회수

These intrinsics are identical, they are just two different names with the exact same functionality.

0 포인트
SergeyKostrov
소중한 기여자 II
4,420 조회수
This is a short follow up. >>The latest version is always available here: >> >>http://software.intel.com/en-us/articles/intel-intrinsics-guide Patrick, Why wouldn't you add the link to: Sticky Thread Forum Topic: Links to instruction documentation Web-link: http://software.intel.com/en-us/forums/topic/285900
0 포인트
SergeyKostrov
소중한 기여자 II
4,420 조회수
>>..."_mm_bsrli_si128" and "_mm_srli_si128" have the same description? _mm_bsrli_si128 - by the way, I don't see that function in Microsoft's version of emmintrin.h header file and I don't see it intrin.h as well _mm_srli_si128 - shifts right
0 포인트
SergeyKostrov
소중한 기여자 II
4,420 조회수
Please also take a look at: Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 2 (2A, 2B & 2C): Instruction Set Reference, A-Z Order Number: 325383-047US June 2013 Page: 886
0 포인트
Patrick_K_Intel
4,420 조회수

Sergey Kostrov wrote:

>>..."_mm_bsrli_si128" and "_mm_srli_si128" have the same description?

_mm_bsrli_si128 - by the way, I don't see that function in Microsoft's version of emmintrin.h header file and I don't see it intrin.h as well

_mm_srli_si128 - shifts right

I believe the bsrli intrinsic was recently added to the Intel compiler headers.

0 포인트
SergeyKostrov
소중한 기여자 II
4,420 조회수
Patrick, The question was: ...What's the difference between them?... and if this is a new function please provide more details. Thanks.
0 포인트
andysem
새로운 기여자 III
4,420 조회수

Sergey, I think Patrick has already stated that the two intrinsics are equivalent.

0 포인트
SergeyKostrov
소중한 기여자 II
4,420 조회수
>>...I think Patrick has already stated that the two intrinsics are equivalent... I don't see it as a logical thing to create another intrinsic function which does exactly the same processing and has a different name.
0 포인트
andysem
새로운 기여자 III
4,365 조회수

In _mm256_permute2x128_si256 descriprion it says:

dst[255:128] := SELECT4(b[255:0], b[255:0], imm[7:4])

 

I believe, the first argument for SELECT4 should be a[255:0]? 

 

0 포인트
Bernard
소중한 기여자 I
4,365 조회수

Yep it seems so.

0 포인트
응답