Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Black Belt
12 Views

ICL 2019 support of simd pragma

I have a case where I have been using #pragma omp simd lastprivate reduction(), probably not a legitimate combination.  It fails now with 2019 (wrong results, no warning), but it runs as well without the pragma as it did with it in the past.  

I haven't found a way to get rid of the usage #pragma simd firstprivate (which the compiler flags as non-standard) without losing performance.

0 Kudos
2 Replies
Highlighted
Moderator
12 Views

Hi Tim,

Not sure if you are able to give us a test case to investigate?

Thanks,

Viet 

0 Kudos
Highlighted
Black Belt
12 Views

I'm not certain whether the apparent regression of the case I mentioned first is a bug.  

// following omp simd usage is likely to fail
#if  _OPENMP >= 201307 
#pragma omp simd lastprivate(index) reduction(max: x)
#endif
      for (int i = 2; i__ <= i__2; ++i)
          if (a > x) {
              x = a;
              index = i;
            }

The Fortran driver (for  > 100 such tests) is too big to attach here, although some of you have it already (e.g. from github), and I could submit a formal support issue.  There is no advantage anymore in activating the pragma omp. ICL optimizes fully, while gcc seems to ignore the pragma and does not optimize.  Of course, the code is wrong for the case where index is never set in the loop, but that is (or ought to be) avoided in the test case.

The simplest case where #pragma simd firstprivate is needed to optimize (a cyclic boundary condition problem):

      int i__2 = *n;
#ifdef __INTEL_COMPILER
      x = b[*n];
      y = b[*n - 1];
#pragma simd firstprivate(x,y)
      for (int i = 1; i <= i__2; ++i) {
          a = (b + x + y) * .333f;
          y = x;
          x = b;
        }
#else

// explicitly peeled code permits gcc to auto-vectorize; not as efficient as the icc code above

If simd firstprivate is replaced by omp private, ICL compiles cleanly (message about pragma simd is gone) but the result is broken (as it should be, as x and y are not initialized).  Without the pragma, it doesn't optimize any better than gcc.

 

0 Kudos