Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Cycling an unnamed do concurrent loop

zp3
Beginner
503 Views

Hi,

why isn't the compiler associating the following cycle statement with the appropriate do concurrent loop?:

program test
    implicit none
    integer :: i,j,k

    do concurrent (i=1:10000)
        k=-10
        do j=1,i
            k=k+k**j
        end do
        if (mod(i,3).eq.0) then
            k=int(sqrt(real(k)))
            write(*,*)'For i = ',i,', k set to ',k,', cycling'
            cycle
        end if
        write(*,*)'Block ended with i = ',i
    end do
end program

I get the error:

cycletest.f90(13): error #6602: A CYCLE or EXIT statement must not be used with a non-DO block.
            cycle
------------^
compilation aborted for cycletest.f90 (code 1)

It works when I name the loop and cycle it with its name (tested with ifort 17.0).

Thanks for help!

 

0 Kudos
4 Replies
Steve_Lionel
Honored Contributor III
503 Views

I think this is a compiler bug. It is probably not looking past the IF to see the enclosing DO construct.

0 Kudos
Kevin_D_Intel
Employee
503 Views

This was found internally and is fixed in the next major release due out later this year.

0 Kudos
zp3
Beginner
503 Views

Ok, thanks for the info.

I'm just a bit surprised that such an error wasn't found earlier. That would implicate that only few people are actually using the do-concurrent language feature. Also, I've heard already some people complaining that do-concurrent would be in some manner a misfeature, at least what's the implicit private/shared rules concerning.

I personally would like to rely heavily on the do-concurrent feature in the future and would like to prefer it over openmp because:
- it's a native language feature
- it's probably the most minimal way to tell the compiler that there aren't any loop-carried dependencies
- the programmer does not have to distinct between vectorization/parallelization (afaik)
- the compiler has every freedom to decide whether it's worthwhile to vectorize/parallelize (so, the programmer does not have to know and the code gets more hardware independent)

So my node-level programming paradigms by now would be:
- use array (:) syntax whenever possible
- when not, use short do concurrent loops (to replace obsolete forall constructs, trying to use array syntax internally)
- use do concurrent also for more complex loops, whenever I know that there aren't any loop-carried dependencies, relying on implicit shared/private rules (also for arrays, types, object, etc.)
- try not to use blocks to keep difference to serial executing code as minimal as possible
- use openmp only when I explicitly know it's worthwhile and/or need some features like reductions, special scheduling, etc.

As one can see my paradigms are just as good as the do-concurrent implementation in the compiler. Therefore my (open) questions:
- Do you think my paradigms are basically well thought out?
- Do you think the do concurrent feature has good prospects and companies as Intel put great efforts in it to optimize its execution as much as possible (or have already)? (also w.r.t. future architectures like mic)

Thanks for any advice!

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
503 Views

If you intend for DO CONCURRENT to actually be concurrent (parallel), then you (effectively) limit the parallization of the application to just the DO CONCURRENT section. While you can mix auto-parallelization (DO CONCURRENT) with OpenMP, it is typically counter-productive (performance-wise) to do so. It used to be that the auto-parallelization thread pool and the OpenMP thread pool were separate pools. This may no longer be the case. In any event, the choice of separate pools or common pool is an implementation detail (which you seem to be adverse to). OpenMP is a well established standard, and supported by all major compiler manufacturers.

In your case, auto-parallelization may be sufficient. If you need, or want, more utilization of CPU resources, then go with directive parallelization.

Jim Dempsey 

0 Kudos
Reply