I've found a few bugs in the Intel Intrinsics Guide 2.7 (I'm using Linux version):
1. When the window is maximized, the search field is stretched vertically while still being a one-line edit box. It sould probably be sized accordingly.
2. __m256 _mm256_undefined_si256 () should return __m256i.
3. In some instructions description, like _mm_adds_epi8, the operation is described in terms of SignedSaturate while, e.g. _mm256_adds_epi16 is described with SaturateToSignedWord. This applies to other operations with unsigned saturation as well. Also, the vector elements are described differently. More consistent description would be nice.
4. _mm_alignr_epi8 has two descriptions.
5. I'm not sure _mm_ceil_pd signature and description is correct. It says the intrinsic returns a vector of single-precision floats. Shouldn't it be double-precision?
I didn't read all instructions so there may be more issues. I'll post if I find anything else.
PS: This is not a bug per se but some instructions are missing the Latency & Throughput information. This mostly relates to newer instructions but still this info is useful and I hope it will be added.
Thanks for the feedback, most of this will be addressed in the next release.
1. I'm not able to replicate this issue with maximizing the window on Linux. What distro are you using? What version of Java?
2. This will be resolved in the next release.
3. All the descriptions and operations have been updated for the next release, so they should now be much more consistent.
4. This will be resolved in the next release.
5. This will be resolved in the next release.
I have not added any additional latency and throughput data yet, but I may get to this soon.
> It would be nice to duplicate these errors online on doc-html-pages where you found issues or problems. As far as I know there is a special button to provide a feedback.
I don't quite understand what pages do you mean. Could you provide a link.
> 1. I'm not able to replicate this issue with maximizing the window on Linux. What distro are you using? What version of Java?
I'm seeing this on Kubuntu 12.04 and 12.10, both x86-64, KDE 4.9.5 whil dual monitors attached. I'm using Oracle Java:
java version "1.7.0_11"
Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
I've attached a screenshot to illustrate the problem.
A new pack of bugs:
1. _mm_cvtss_f32 is described to be equivalent to cvtss2si instruction. I suppose, the intrinsic should not generate any instructions, if the compiler uses SSE for math calculations or should simply store the value to some memory or general purpose register. But it sould not convert the float value to an integer.
2. _mm_cvtsi32_si128 is said to extend the upper bits of the operand in the description, but it should extend it with zeros.
Version 2.8 has been released:
Note that this release does include additional latency and throughput data.
Regarding the two new issues:
1. You're correct, cvtss2si is the wrong instruction. movss is the official instruction, although you'll often see different instructions based on context. This will be resolved in the next release.
2. This issue was already resolved in v2.8.
Are you still seeing the issue with the search box expanding on Linux with v2.8?
Thanks for the updated release.
Yes, the problem with the search box is still present. I must say, it wasn't present before 2.7 (I think, that version introduced some interface changes; aside the search field, I think, the fonts also changed).
This release is great!
Now there are latency and throughput data for Ivy Bridge, too!
I waited for this quite some time. One always had to look in the really big manuals to find that sort of information.
One additional bug: _mm_max_epu32 signature contains three arguments: __m128i _mm_max_epu32 (__m128i a, __m128i b, __m128i b). I believe, the last one should be removed.
The description of the _mm256_shuffle_epi8 intrinsic looks like it acts cross-lane. And its formal algorithm doesn't clarify that because its index value is [0..15] bounded, and it is not adjusted for the second lane (this would result in lane 0 of a being distributed to both lanes of b).
Just noted that 2.8.1 has been released. Thanks for the update.
_mm256_shuffle_epi8 description is still confusing. And the original issue with the search bar is not fixed too. I somehow forgot to mention that the problem shows not only with maximized window, but also with normal window larger than a certain size vertically. I suppose, the field size is ok when the window height is less or equal to the total height of all widgets, and when it exceeds it the search field is stretched instead of adding unused space in the bottom. Is there any estimate for the fix?
You're correct about _mm256_shuffle_epi8, it is not a cross lane operation, I will fix the description and operation in the next release. Regarding the search bar issue, I have not been able to reproduce this on Ubuntu.
> Regarding the search bar issue, I have not been able to reproduce this on Ubuntu.
Hmm, I can reproduce it on all 3 of my systems, with Nvidia and AMD graphics and different drivers, on Kubuntu from 12.04 to 13.04. I'm using Oracle Java 1.7.
I have quite large displays though - 2560x1440 on two of my machines and 1920x1200 on another laptop. I'm not sure that a 1920x1080 display is big enough for the problem to manifest itself as this height will be filled with widgets. If you don't have access to a bigger display you can try to attach a second display and arrange it to be below your main display and stretch the window vertically. Or you can do the same with a single display if you move the window to the lower side of the screen (so that the window goes partially below the edge) and then resize the window vertically by dragging its top edge upwards.
I can see version 3.0.1 has been released. It seems the problem with the search field has been resolved, thanks!
Some AVX-512 intrinsics include latency/throughput information for CPUs that do not support the according instructions. For example, _mm512_add_epi32 and similar intrinsics have this data for 06_3C CPUs, which I believe are Haswell. The data also applies to 06_45/46, but I don't know what CPUs these are. _mm512_maskz_cvtepi32_ps has latency/throughput for 06_2A (Sandy Bridge) and later CPUs. There are other intrinsics with this problem as well.