DO CONCURRENT: iteration-local variables and autoparallelization

zp3 · ‎02-17-2016

Hi, I'm wondering about 2 things concerning the DO-CONCURRENT construct:

1. As described in https://software.intel.com/en-us/node/526121 the compiler may automatically distinguish between iteration-local variables and global variables by the fact if a variable becomes defined in every loop or not (See example in link, variable Q). On the other hand one may define a block inside the DO-CONCURRENT construct to do so:

DO CONCURRENT (I=1:N)
    BLOCK
        REAL :: Q
        Q=B(I)+C(I)
        D(I)=Q+SIN(Q)+2
    END BLOCK
END DO

Now, I don't know what is better practice or what runs faster/more reliable. IMHO the inner block variant is more clear as the programmer explicitly declares the iteration private variables.

2. I initially thought that the DO-CONCURRENT construct has primary the sense of telling the compile that this iteration is safe for parallelization, but obviously that isn't true, as for example:

DO CONCURRENT (I=1:N)
    D(LBA(I):UBA(I))=I
END DO

isn't parallelized as the compiler tells me of a possible dependeny here (because it can't know about overlappings between iterations). If I can't tell the compiler that it's safe for parallelization by stating do concurrent, what is do concurrent good for?

Another failure case is the following:

program t
    implicit none
    real :: D(1000)
    integer :: i,k
    do concurrent(i=1:1000)
        block
           real :: Q(10)
           do k=1,10
               Q(k)=real((i+k)**2)
           end do
           D(i)=Q(mod(i,k))+sin(Q(7))+2
        end block
    end do
    write(*,*) D
end program t

The compiler (ifort 16.0.1) finds dependencies between Q line 9 and line 11, that seem to prevent parallelization, but why?

Thanks for help/advices!

jimdempseyatthecove · ‎02-17-2016

The documentation states:

The following are additional rules for DO CONCURRENT constructs:

•A variable that is referenced in an iteration must be previously defined during that iteration, or it must not be defined or become undefined during any other iteration.

The example in the documentation illustrates a case where the compiler can know that the same variable will not be multiply defined (by different iterations).

Your second example makes it indeterminate as to if this requirement would be violated.

Jim Dempsey

jimdempseyatthecove · ‎02-17-2016

I agree that your 3rd question (local Q, referenced twice), should not have inhibited DO CONCURRENT.

The following is a work around:

program t
    implicit none
    real :: D(1000)
    integer :: i,k
    do concurrent(i=1:1000)
      D(i)=calc_D(i)
    end do
    write(*,*) D
contains
    pure function calc_D(i) result(D)
      implicit none
      integer, intent(in) :: i 
      real :: D
      integer :: k
      real :: Q(10)
      do k=1,10
        Q(k)=real((i+k)**2)
      end do
      D=Q(mod(i,k))+sin(Q(7))+2
    end function
end program t

Jim Dempsey

zp3 · ‎02-23-2016

Ok, thanks for your answer, but I was already aware of the fact, that the compiler can't determine, if the second example may be safe for parallelization or not. That's why I posed my question; AFAIK the first example would even be parallized using a normal DO-loop, so no need for DO-CONCURRENT and if I can't tell the compiler explicitly that the second loop is safe by stating DO-CONCURRENT why may one ever need the DO-CONCURRENT construct at all???

jimdempseyatthecove · ‎02-23-2016

There is small but significant difference in the rules for (parallel) DO CONCURRENT than for and OpenMP PARALLEL DO.

DO CONCURRENT requires each (equivalence to DO) iteration not muck with other (equivalence to DO) iteration references.

Whereas OpenMP PARALLEL DO makes no such requirement, but rather specifies it is the programmer's responsibility to assure there is not adverse (unintended) interaction. For example at thread iteration slice boundaries where you reference Array(I-1) and/or Array(I+1).

The consequences of the compiler making a false negative (for concurrency) is less than the consequences of it making a false positive (for concurrency).

It is generally better to use explicit parallel programming (OpenMP) than to use implicit parallel programming (auto-parallelism and DO CONCURRENT). There may be a few instances where DO CONCURRENT may be a better choice (emphasis on few).

Jim Dempsey

Steven_L_Intel1 · ‎02-23-2016

I don't agree with Jim's characterization of DO CONCURRENT. Like with OMP PARALLEL DO, DO CONCURRENT is a promise by the programmer that the iterations can be executed in any order and to any degree of parallelism. This implies no loop-carried dependencies or side-effects. The compiler can use this information to help with vectorization and parallelization and indeed it does, though we've recently discovered that we weren't being as "helpful" to the optimizer as we could be - that should get fixed in a future version.

I'll also comment that Fortran 2015 adds something like PRIVATE and FIRSTPRIVATE to DO CONCURRENT, though the wording of the standard in this area continues to be contentious. (The F2015 keywords here are LOCAL and LOCAL_INIT.)

jimdempseyatthecove · ‎02-23-2016

Steve,

Correct me if I am wrong...

The compiler will reject parallelization of DO CONCURRENT if it suspects loop carried dependencies or side-effects. IOW enforcement of the rules.

The compiler will parallelize OpenMP DO loop code even if it can prove loop carried dependencies or side-effects. IOW no enforcement of the rules (recommendations).

Jim Dempsey

Steven_L_Intel1 · ‎02-23-2016

My understanding is that the compiler essentially pretends that a DO CONCURRENT is an !DIR$ PARALLEL region with some additional directive indicating lack of dependencies. I'd have to go ask the developers for details. Of course, it only parallelizes if you say -parallel, but DO CONCURRENT also helps vectorization. So the rules aren't quite the same as OpenMP. I'll admit this is not my strongest area.

zp3 · ‎02-24-2016

Without knowing, I'd intuitively share Steve's understanding about DO CONCURRENT. From my point of view DO CONCURRENT should work as following:

1.) Parallelizing all loops even if the compiler can't prove determination of the code

2.) Raise an error if the compiler is able to prove indetermination of the code

3.) Not to parallelize the code for reasons of optimization (cost-benefit analysis, etc.)

If DO CONCURRENT would work that way it would have huge advantages against OpenMP as it reacts more intelligent, may have a wider understanding of parallelism and is part of the language (so would supersede OpenMP somehow)

TimP · ‎02-24-2016

I've found advantages in vectorization with do concurrent only for certain cases involving a mask, and that only with Intel Fortran.

As parallelization with do concurrent requires setting -parallel, and an application large enough to benefit from parallel but able to accomplish all important operations under do concurrent seems unlikely, I don't see do concurrent as sufficient to eliminate the advantages of OpenMP.