Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

ICL 2019 support of simd pragma

Black Belt

I have a case where I have been using #pragma omp simd lastprivate reduction(), probably not a legitimate combination.  It fails now with 2019 (wrong results, no warning), but it runs as well without the pragma as it did with it in the past.  

I haven't found a way to get rid of the usage #pragma simd firstprivate (which the compiler flags as non-standard) without losing performance.

0 Kudos
2 Replies

Hi Tim,

Not sure if you are able to give us a test case to investigate?



0 Kudos
Black Belt

I'm not certain whether the apparent regression of the case I mentioned first is a bug.  

// following omp simd usage is likely to fail
#if  _OPENMP >= 201307 
#pragma omp simd lastprivate(index) reduction(max: x)
      for (int i = 2; i__ <= i__2; ++i)
          if (a > x) {
              x = a;
              index = i;

The Fortran driver (for  > 100 such tests) is too big to attach here, although some of you have it already (e.g. from github), and I could submit a formal support issue.  There is no advantage anymore in activating the pragma omp. ICL optimizes fully, while gcc seems to ignore the pragma and does not optimize.  Of course, the code is wrong for the case where index is never set in the loop, but that is (or ought to be) avoided in the test case.

The simplest case where #pragma simd firstprivate is needed to optimize (a cyclic boundary condition problem):

      int i__2 = *n;
      x = b[*n];
      y = b[*n - 1];
#pragma simd firstprivate(x,y)
      for (int i = 1; i <= i__2; ++i) {
          a = (b + x + y) * .333f;
          y = x;
          x = b;

// explicitly peeled code permits gcc to auto-vectorize; not as efficient as the icc code above

If simd firstprivate is replaced by omp private, ICL compiles cleanly (message about pragma simd is gone) but the result is broken (as it should be, as x and y are not initialized).  Without the pragma, it doesn't optimize any better than gcc.


0 Kudos