I am developping some vectorized code with OpenMP v4 and I am really happy with the performance I get with Intel compilers. Unfortunately, both gcc 7.2.0 and clang 5.0.0 make a mess of those (#pragma omp simd) and give very bad performance. This make the use of OpenMP v4 non portable across compilers in terms of performance. This makes writing software for AVX, AVX2 and AVX512 quite difficult as intrinsics seems to be the only solution if you want to get good performance across compilers.
Do you know if Intel people are helping GCC/LLVM people to improve their vectorizer?
Do you think performance with OpenMP 4 will come with GCC/LLVM?
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Parallel Computing
states GCC 6.1 (and presumably later) fully support OpenMP 4.5 (provided -fopenmp is supplied as an option switch). IOW it should provide support for
#pragma omp simd
when -fopenmp is used.
LLVM clang states v3.9 support of all non-offload features of OpenMP 4.5 (#pragma omp simd)