Intel® C++ Compiler
Support and discussions for creating C++ code that runs on platforms based on Intel® processors.
7700 Discussions

Can simd and omp parallel for work together for a loop?


I have a loop, can ivdep, simd and omp parallel for work together for the loop like

#pragma ivdep
#pragma simd
#pragma omp parallel for
for(int i = 0; i < n; ++i)
    ... // code without data dependencies between iterations

or the compiler might just choose simd or omp parallel?

0 Kudos
5 Replies

Hi Jayden,
As you may be aware, the #pragma ivdep allows the user to give data dependence assertion hints to the compiler and vectorization though is still under the compiler's discretion. The pragma simd though complements the ivdep clause in allowing the user to enforce vectorization per-se and will override the previously declared ivdep clause.  With reference to OpenMP,  in OpenMP 4.0 there is simd support through the "#pragma omp simd"  clause which you can use if you want to use omp parallel with simd (example below) thereby enforcing vectorization of the loop just as you used before without omp:
  #pragma omp simd safelen(4)
 for (...) { }

Make sure to use appropriate omp clauses with "#pragma omp simd"  in order to properly take care of data dependencies to avoid data corruption etc. 


New Contributor III

Sorry to hijack the thread, I thought this question could benefit some people reading it.

I have seen vectorized OpenMP parallel loops with "pragma ivdep" causing segmentation fault on Xeon Phi. This issue could be fixed by specifying something like "schedule(static,16)", so that the parallel loop chunks up the iteration space at points where data structures are aligned.

Does anybody know if specifying a "good" chunk size like that is a good (i.e., future-portable) practice for combining vectorization and multi-threading in one loop? Or is the implementation of OpenMP > 4.0 supposed to take care of it?


Hi Andrey,
That's a good question and with what I understand OpenMP 4.0 should take care but let me check with the openmp team and update you accordingly, thx.


Black Belt

#pragma omp parallel for simd covers the intent of those Intel specific directives. Intel compilers often do a good job without simd clause, and others ignore it in this context.

safelen (4) would limit simd to 4 wide including unrolling. Compilers sometimes ignore this clause. 

With Mic option opt-assume-safe-padding, segfault is possible, particularly for gather-scatter.

We discussed this question of optimizing chunks in the past.  Arrays must be aligned and set to a multiple of chunk size times number of threads. Desirable chunk size would fit simd width. Accommodating avx512 plus unroll seems sufficient for the foreseeable future. I suppose then assume safe padding isn't needed.


Thanks Tim for nicely elaborating further.

@Andrey:  I synced up with our OpenMP team and confirmed that we're working on the design aspect of the new SIMD modifier schedule clause feature introduced in OpenMP 4.5, for the next version 17.0 beta update 1 release this coming year.  That said, the #pragma ivdep should not cause any seg violation as we do runtime peeling for OpenMP parallel loops during vectorization. You need to ensure that you either use #pragma vector aligned or assume aligned and such to force alignment per-se. Hope this helps.