Parallelizing whole array operations with OpenMP workshare

Sourish_B_ · ‎08-27-2015

Hi all,

I have some Fortran code with a bunch of operations like this:

conv_adv = pu(0:imr-1,1:jmr,1:lmr) - pu(1:imr,1:jmr,1:lmr) + pv(1:imr,1:jmr,1:lmr) - pv(1:imr,2:jmr+1,1:lmr)
am(0:imr,1:jmr,1:lmr) = dtu * pu(0:imr,1:jmr,1:lmr) ! dtu is a scalar

which I'm thinking of speeding up with OpenMP workshare, now that ifort 15.0+ parallelizes some WORKSHARE constructs instead of replacing them with SINGLE. However, according to https://software.intel.com/en-us/articles/openmp-workshare-constructs-now-parallelize-with-intel-fortran-compiler-150 neither of the two operations above would parallelize, because

Of the arrays on the right hand side of 'conv_adv', one of them has a lower bound of 0
In the second example, the lower bound on the left hand side is not 1 but 0

Is this a correct interpretation of "If the lower bound of the left hand side or the array slice lower bound or the array slice stride on the right hand side is not 1, then the statement does not parallelize." according to https://software.intel.com/en-us/articles/openmp-workshare-constructs-now-parallelize-with-intel-fortran-compiler-150, or is my understanding wrong?

Thanks,

Sourish

TimP · ‎08-27-2015

That reference does indicate that omp workshare will not parallelize when the lower bounds don't all match at 1.

A further problem is that you're asking the compiler to figure out over which subscript to parallelize. OpenMP tends to need unambiguous specification. You could use an outer omp do over the 2nd subscript and inner array assignments.

jimdempseyatthecove · ‎08-27-2015

Something like:

!$omp parallel do
do l=1,lmr
  conv_adv(0:imr-1,1:jmr,l) = pu(0:imr-1,1:jmr,l) - pu(1:imr,1:jmr,l) + pv(1:imr,1:jmr,l) - pv(1:imr,2:jmr+1,l)
  am(0:imr,1:jmr,l) = dtu * pu(0:imr,1:jmr,l) ! dtu is a scalar
end do

.or.

!$omp parallel do collapse(2)
do j=1,jmr
  do l=1,lmr
    conv_adv(0:imr-1,j,l) = pu(0:imr-1,j,l) - pu(1:imr,j,l) + pv(1:imr,j,l) - pv(1:imr,j+1,l)
  am(0:imr,j,l) = dtu * pu(0:imr,j,l) ! dtu is a scalar
end do

Jim Dempsey