Bugs in Intrinsics Guide

andysem · ‎01-30-2013

Hi,

I've found a few bugs in the Intel Intrinsics Guide 2.7 (I'm using Linux version):

1. When the window is maximized, the search field is stretched vertically while still being a one-line edit box. It sould probably be sized accordingly.

2. __m256 _mm256_undefined_si256 () should return __m256i.

3. In some instructions description, like _mm_adds_epi8, the operation is described in terms of SignedSaturate while, e.g. _mm256_adds_epi16 is described with SaturateToSignedWord. This applies to other operations with unsigned saturation as well. Also, the vector elements are described differently. More consistent description would be nice.

4. _mm_alignr_epi8 has two descriptions.

5. I'm not sure _mm_ceil_pd signature and description is correct. It says the intrinsic returns a vector of single-precision floats. Shouldn't it be double-precision?

I didn't read all instructions so there may be more issues. I'll post if I find anything else.

PS: This is not a bug per se but some instructions are missing the Latency & Throughput information. This mostly relates to newer instructions but still this info is useful and I hope it will be added.

SergeyKostrov · ‎01-30-2013

Thanks for the feedback! It would be nice to duplicate these errors online on doc-html-pages where you found issues or problems. As far as I know there is a special button to provide a feedback. >>... >>PS: This is not a bug per se but some instructions are missing the Latency & Throughput information. This mostly relates to >>newer instructions... This is a known issue and was addressed several times during last a couple of months. Even some older instructions are missing, unfortunately. Best regards, Sergey

Patrick_K_Intel · ‎01-31-2013

Thanks for the feedback, most of this will be addressed in the next release.

1. I'm not able to replicate this issue with maximizing the window on Linux. What distro are you using? What version of Java?

2. This will be resolved in the next release.

3. All the descriptions and operations have been updated for the next release, so they should now be much more consistent.

4. This will be resolved in the next release.

5. This will be resolved in the next release.

I have not added any additional latency and throughput data yet, but I may get to this soon.

SergeyKostrov · ‎01-31-2013

>>...I have not added any additional latency and throughput data yet, but I may get to this soon. Thanks for the update and please keep everybody informed!

andysem · ‎02-01-2013

@Sergey Kostrov

> It would be nice to duplicate these errors online on doc-html-pages where you found issues or problems. As far as I know there is a special button to provide a feedback.

I don't quite understand what pages do you mean. Could you provide a link.

@Patrick Konsor

> 1. I'm not able to replicate this issue with maximizing the window on Linux. What distro are you using? What version of Java?

I'm seeing this on Kubuntu 12.04 and 12.10, both x86-64, KDE 4.9.5 whil dual monitors attached. I'm using Oracle Java:

java version "1.7.0_11"
Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)

I've attached a screenshot to illustrate the problem.

andysem · ‎02-08-2013

A new pack of bugs:

1. _mm_cvtss_f32 is described to be equivalent to cvtss2si instruction. I suppose, the intrinsic should not generate any instructions, if the compiler uses SSE for math calculations or should simply store the value to some memory or general purpose register. But it sould not convert the float value to an integer.

2. _mm_cvtsi32_si128 is said to extend the upper bits of the operand in the description, but it should extend it with zeros.

Patrick_K_Intel · ‎02-11-2013

Version 2.8 has been released:
http://software.intel.com/en-us/articles/intel-intrinsics-guide

Note that this release does include additional latency and throughput data.

Regarding the two new issues:
1. You're correct, cvtss2si is the wrong instruction. movss is the official instruction, although you'll often see different instructions based on context. This will be resolved in the next release.
2. This issue was already resolved in v2.8.

Are you still seeing the issue with the search box expanding on Linux with v2.8?

andysem · ‎02-11-2013

Thanks for the updated release.

Yes, the problem with the search box is still present. I must say, it wasn't present before 2.7 (I think, that version introduced some interface changes; aside the search field, I think, the fonts also changed).

SergeyKostrov · ‎02-11-2013

>>Version 2.8 has been released: >>software.intel.com/en-us/articles/intel-intrinsics-guide >> >>Note that this release does include additional latency and throughput data. Thank you, Patrick.

Christian_M_2 · ‎02-12-2013

This release is great!

Now there are latency and throughput data for Ivy Bridge, too!

I waited for this quite some time. One always had to look in the really big manuals to find that sort of information.

andysem · ‎04-22-2013

One additional bug: _mm_max_epu32 signature contains three arguments: __m128i _mm_max_epu32 (__m128i a, __m128i b, __m128i b). I believe, the last one should be removed.

SergeyKostrov · ‎04-26-2013

Yes. That is correct and here is a declaration from smmintrin.h header file ( Intel version ): ... extern __m128i __ICL_INTRINCC _mm_max_epu32( __m128i, __m128i ); ...

andysem · ‎04-27-2013

__int _mm256_movemask_epi8 (__m256i a)

Please, remove the leading underscores in the return type.

Patrick_K_Intel · ‎04-29-2013

Thanks, this issue will be fixed in the next release.

SergeyKostrov · ‎04-29-2013

>>...__int _mm256_movemask_epi8 (__m256i a) Here is a declaration from immintrin.h header file ( Intel version ): ... /* * Returns a 32-bit mask made up of the most significant bit of each byte * of the 256-bit vector source operand. */ extern int __ICL_INTRINCC _mm256_movemask_epi8(__m256i); ...

andysem · ‎05-09-2013

The description of the _mm256_shuffle_epi8 intrinsic looks like it acts cross-lane. And its formal algorithm doesn't clarify that because its index value is [0..15] bounded, and it is not adjusted for the second lane (this would result in lane 0 of a being distributed to both lanes of b).

andysem · ‎05-19-2013

Just noted that 2.8.1 has been released. Thanks for the update.

_mm256_shuffle_epi8 description is still confusing. And the original issue with the search bar is not fixed too. I somehow forgot to mention that the problem shows not only with maximized window, but also with normal window larger than a certain size vertically. I suppose, the field size is ok when the window height is less or equal to the total height of all widgets, and when it exceeds it the search field is stretched instead of adding unused space in the bottom. Is there any estimate for the fix?

SergeyKostrov · ‎05-22-2013

>>Just noted that 2.8.1 has been released... Here is a link to download a recently released Intel Intrinsics Guide for Windows verion 2.8.1: software.intel.com/sites/default/files/Intel_Intrinsics_Guide-windows-v2.8.1.zip

Patrick_K_Intel · ‎05-22-2013

You're correct about _mm256_shuffle_epi8, it is not a cross lane operation, I will fix the description and operation in the next release. Regarding the search bar issue, I have not been able to reproduce this on Ubuntu.

andysem · ‎05-22-2013

> Regarding the search bar issue, I have not been able to reproduce this on Ubuntu.

Hmm, I can reproduce it on all 3 of my systems, with Nvidia and AMD graphics and different drivers, on Kubuntu from 12.04 to 13.04. I'm using Oracle Java 1.7.

I have quite large displays though - 2560x1440 on two of my machines and 1920x1200 on another laptop. I'm not sure that a 1920x1080 display is big enough for the problem to manifest itself as this height will be filled with widgets. If you don't have access to a bigger display you can try to attach a second display and arrange it to be below your main display and stretch the window vertically. Or you can do the same with a single display if you move the window to the lower side of the screen (so that the window goes partially below the edge) and then resize the window vertically by dragging its top edge upwards.

andysem · ‎07-24-2013

I can see version 3.0.1 has been released. It seems the problem with the search field has been resolved, thanks!

Some AVX-512 intrinsics include latency/throughput information for CPUs that do not support the according instructions. For example, _mm512_add_epi32 and similar intrinsics have this data for 06_3C CPUs, which I believe are Haswell. The data also applies to 06_45/46, but I don't know what CPUs these are. _mm512_maskz_cvtepi32_ps has latency/throughput for 06_2A (Sandy Bridge) and later CPUs. There are other intrinsics with this problem as well.