AVX2/512: Surprising return types and parameter types

Markus_Dreseler · ‎02-24-2017

Hi,

while working with AVX-2 and -512, we noticed the following discrepancies:

1) Why does _mm256_i64gather_epi64 return an __m128i according to the documentation? We would expect an __m256i. Dash agrees.

2) Why is the AVX-512 stream load interface different from AVX2?

extern __m256i _mm256_stream_load_si256(__m256i const *);
extern __m512i _mm512_stream_load_si512(void * mem_addr);

Especially the missing constness is a problem (albeit minor) because it requires a const_cast that should be unnecessary.

Thanks
Markus

TimP · ‎02-24-2017

1) immintrin.h for the current Windows 64-bit Intel C++ shows __m256i return type

2) zmmintrin.h shows _mm512_stream_load_si512(void const*)

SergeyKostrov · ‎02-24-2017

You need to be careful with zmmintrin.h header file. This is because Intel merged two ISAs, that is AVX2 and AVX-512. I have both versions: - from Intel C++ compiler version 13 for Windows, and - from Intel C++ compiler versions 16 and 17 for Linux. Let me know if you need these headers for review. If you have a recent version of Intel C++ compiler, which supports AVX-512, then it is the most up to date version. Why Intel merged two ISAs is not clear ( even if z is the last letter of English alphabet there are more not used letters for coding intrinsic header files ).

SergeyKostrov · ‎02-24-2017

One more note: immintrin.h header file is for AVX ISA ( not AVX2 ).

TimP · ‎02-25-2017

I didn't mean to address the touchy question of which method is meant to be used to make the macro definitions available in your code for your preferred ISA. Hint: it's not (AFAIK) by including or not including zmmintrin.h directly. The arch or equivalent compiler flag must be set to the target ISA. With Intel compilers, that involves automatic promotion of SSE2 to newer ISA according to the setting. It may even avoid use of AVX2 macros if those are recognized as slower than AVX. If you want it to work with several compilers (e.g. MSVC, Intel, gnu, clang) I think you have to test each of them.

I supposed zmmintrin.h was named for its use of the z (512-bit) registers.