Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

DO CONCURRENT bug

mfinnis
New Contributor II
1,629 Views

An old thread in Fortran Discourse was revived recently and reading the earlier posts in it there was an example of DO CONCURRENT with local_init not behaving correctly when compiled with ifort 2021.5 (https://fortran-lang.discourse.group/t/clarification-on-do-concurrent/1647/9). Having only just installed w_fortran-compiler_p_2023.1.0.46351 I wondered whether the bug had been fixed. It doesn't seem to be. I don't whether the bug wasn't reported or whether it's on the to-do list. Either way, for info:

program Main
    implicit none
    integer :: a, i
    a = 5
    do concurrent(i = 1:5) local_init(a)
        print '(i2,i4)', i, a 
        a = a*2
        print '(2x,i4)', a
    end do
end program

when running sequentially results in

 1   5
    10
 2  10
    20
 3  20
    40
 4  40
    80
 5  80
   160

and with the loop parallelized results in something like

 1   5
 3   5
 2   5
 4   5
    10
    10
    10
    10
 5   5
    10

 

17 Replies
Ron_Green
Moderator
1,619 Views

The output looks correct to me.  And from the discussion on Fortran Discourse I believe it was explained.

When you use DO CONCURRENT you are asserting to the compiler that the iterations of the loop are independent and can be run in any order.  And thus can be parallelized and run independently in parallel.  Key concept here - independently.  No loop-carry dependencies or state assumed. 

In the -qopenmp case where you ask the compiler to parallelize this loop it does exactly that.  In this case, the 5 iterations are run in parallel.  And since 'a' is local_init, each iteration first sets the initial value of it's copy to the value of 'a' just outside the loop which is 5.  Each copy prints it's initial value of a.  
Prints are run in parallel without synchronization, thus the output can be in any order.  Run it a few times and you may get another order.  Given what you show, I bet you have a 4-processor CPU.  Just a guess because iteration 5 initial print was lagging, waiting for a free processor no doubt.

 

after the initial print, each independent iteration process doubles the value of 'a' to 10.  Each independent process then print's it's local copy of a, which is 10.

 

No error.  Working as intended.

0 Kudos
Ron_Green
Moderator
1,619 Views

Looked at another way - if you EXPECT 'a' to have values doubling on each iteration, isn't that depending on the state from a previous iteration?  And hence, you have violated your assertion that the iterations are completely independent, completely not relying on previous iterations, or the state of variables from previous iterations?  the "local" in local_init is also asserting that the state of 'a' is local to the iteration and does not rely or need state from any other iteration.

 

mfinnis
New Contributor II
1,570 Views

See IanH's comment. The issue is that when run without parallelization the output is wrong. As I understand it, there is no requirement in the standard for DO CONCURRENT to be run parallelized and in any case ifort and ifx are quite happy to produce both sequential and parallel code that produces differing output with no warning.

0 Kudos
mfinnis
New Contributor II
1,531 Views

Perhaps I should have been more clear. ifort and ifx produce code that runs sequentially/in a single thread/without parallelization that doesn't initialize the local variable a. So the bug is in the sequential code not the parallelized code.

0 Kudos
IanH
Honored Contributor II
1,600 Views

Is the complaint about the incorrect results when running "sequentially" without /Qopenmp?  The warning about ignored locality identifiers is a bit of a cop-out.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,544 Views

Think of it this way:

program Main
    implicit none
    integer :: a, i
    a = 5
!$omp parallel do firstprivate(a) private(i)
    do i = 1,5
        print '(i2,i4)', i, a 
        a = a*2
        print '(2x,i4)', a
    end do
!$omp end parallel do
end program

The results values will depend on number of threads, the results order will depend on the order of execution amongst threads.

 

The output is indeterminant with respect to single thread output.

Jim Dempsey

0 Kudos
IanH
Honored Contributor II
1,502 Views

That's would be a broken implementation in the general case (number of threads not matching number of iterations).  DO CONCURRENT is specified in terms of iterations, not threads.  Local "locality" entities are local to the iteration, not to a thread. 

DO CONCURRENT really means "do unordered".  You (should) get the same set of lines regardless of how things are implemented, but not necessarily in the same order.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,505 Views

From my understanding, the local init occurs at the entry into the concurrent region... but not within (at top of) the loop. Therefore both the serial and the parallel outputs are correct.

 

Jim Dempsey

0 Kudos
IanH
Honored Contributor II
1,411 Views

The Fortran standard has no concept of "threads" or "concurrent region".  It says (emphasis added):

At the beginning of each iteration...

- a variable with LOCAL_INIT locality has the pointer association status and definition status of the outside variable with that name; the outside variable shall not be an undefined pointer or a nonallocatable nonpointer variable that is undefined.

0 Kudos
Ron_Green
Moderator
1,450 Views

I am also at a loss to understand what, @mfinnis , you mean when you say this:

"The issue is that when run without parallelization the output is wrong"

Please show the complete "serial" compilation line without -qopenmp and your output, and tell us what you perceive is "wrong". 

0 Kudos
mfinnis
New Contributor II
1,431 Views

@Ron_Green ,

See @IanH's post above. That is it in a nutshell. DO CONCURRENT doesn't imply that the loop is necessarily parallelized and I was staggered that @jimdempseyatthecove seems to be happy that the code can produce completely different output depending on whether the compiler decides to produce parallel code for the loop.

Parallel code should produce something like:

 

 1   5
 3   5
 2   5
 4   5
    10
    10
    10
    10
 5   5
    10

 

Serial code should produce:

 

 1   5
    10
 2   5
    10
 3   5
    10
 4   5
    10
 5   5
    10

 

The variable a in the DO CONCURRENT construct is not same variable as that outside the construct as it is declared with local_int(a) and should be initialized to the value of the variable outside the construct at the start of each iteration of the loop. Both ifort and ifx do not initialize the local variable at the start of each iteration for serial code. (Actually, I've just checked the value of a after the loop exits, and for serial code it is 160 - it's as if local_init(a) is ignored - and for parallel code it is 5, which is correct.)

 

Doh! I've just rebuilt the program without /Qparallel and seen the warning

warning #5423: Locality information is ignored without one of these command line qualifiers '/Qopenmp or Qparallel'

(I had wondered at the comment about the warning in @IanH 's first post above.) So, in a way, the above is just stating the bleeedin' obvious. However, I had compiled with /Qparallel so there wasn't a warning but the compiler didn't produce parallel code for the loop resulting in the 'incorrect' output shown in the original post.

Is the warning produced by the compiler a stop-gap until DO CONCURRENT is implemented according to the Standard (and Intel's documentation), or is it only going to be implemented for parallelized code? If the latter, then I suggest that the warning needs to be clearer (and made an error) and/or a DO CONCURRENT construct should be parallelized regardless of other considerations if the /Qparallel qualifier is present.

@Ron_Green, sorry, the post has sort of morphed into something else. It looks as if circumstances contrived to confuse the issue - or to confuse me at least. As far as the compiler options used, I just created a console project in Visual Studio and used the Release x64 configuration with the Parallelization option set to Yes (/Qparallel). This results in the command line:

/nologo /O2 /Qparallel /module:"x64\Release\\" /object:"x64\Release\\" /Fd"x64\Release\vc160.pdb" /libs:dll /threads /c

This gives no warning and results in code that produces the 'incorrect' output.

 

 

0 Kudos
JohnNichols
Valued Contributor III
1,427 Views

I was staggered. 

 

Interesting choice of words, I tend not to use words that have the buttery feeling on the Intel Site,  after all aside from the Intel people no-one gets paid to answer the questions.  

 

0 Kudos
Barbara_P_Intel
Employee
1,418 Views

After some more research and discussion with the Fortran compiler team, that warning message is not correct. And the LOCAL_INIT should be defined at the beginning of each iteration. I filed CMPLRLLVM-48373.


0 Kudos
FortranFan
Honored Contributor III
1,344 Views

@Barbara_P_Intel wrote:

After some more research and discussion with the Fortran compiler team, that warning message is not correct. And the LOCAL_INIT should be defined at the beginning of each iteration. I filed CMPLRLLVM-48373.



@Barbara_P_Intel , @Ron_Green ,

Perhaps your Intel Support team may review this with the writer(s) of DGR and consider how the documentation can be improved re: Fortran standard DO CONCURRENT: some points to consider -

  1. Better to not use any direct analogy with OpenMP syntax and semantics when it comes to DO CONCURRENT in the Fortran standard.  DO CONCURRENT is a full-fledged QUIRK with no parallel!
  2. With some of the semantics with DO CONCURRENT, an illustration using the BLOCK construct in the language might be worth it - see below with LOCAL_INIT,
  3. The semantics of DO CONCURRENT only suggests to the processor that execution in any order is viable; this may NOT mean parallel execution.
  4. More examples the Intel team can put together, the better.

Consider this variant of the case given in the original post:

   integer :: i, a
   character(len=*), parameter :: fmtg = "(g0,t15,g0,t30,g0)"
   print fmtg, "State", "i", "a"
   a = 5
   do concurrent(i = 1:5)
      block
         integer :: la
         la = a
         print fmtg, "Begin loop", i, la 
         la = la*2
         print fmtg, "End loop", i, la 
      end block
    end do
end
  • Execution of the program built using /Qopenmp option: note 'a' is 5 and 10 at the beginning and end of each loop respectively.
C:\temp>ifort /free /standard-semantics /Qopenmp p.f
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.9.0 Build 20230302_000000
Copyright (C) 1985-2023 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.34.31937.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
p.obj

C:\temp>p.exe
State         i              a
Begin loop    4              5
Begin loop    5              5
End loop      4              10
End loop      5              10
Begin loop    3              5
Begin loop    1              5
End loop      3              10
Begin loop    2              5
End loop      1              10
End loop      2              10
  • Execution of the program built without /Qopenmp: note a) there are NO compiler warnings and b) the value of 'a' in the output is 5 and 10 at the beginning and the end of the loop respectively.
C:\temp>ifort /free /standard-semantics p.f
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.9.0 Build 20230302_000000
Copyright (C) 1985-2023 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.34.31937.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\temp>p.exe
State         i              a
Begin loop    1              5
End loop      1              10
Begin loop    2              5
End loop      2              10
Begin loop    3              5
End loop      3              10
Begin loop    4              5
End loop      4              10
Begin loop    5              5
End loop      5              10

Now consider this variant with LOCAL_INIT:

   integer :: i, a
   character(len=*), parameter :: fmtg = "(g0,t15,g0,t30,g0)"
   print fmtg, "State", "i", "a"
   a = 5
   do concurrent(i = 1:5) local_init(a)
      print fmtg, "Begin loop", i, a 
      a = a*2
      print fmtg, "End loop", i, a 
    end do
end
  • Execution of the program built using /Qopenmp: note there are NO warnings and the program output is similar to the case above i.e., the value of 'a' is 5 and 10 at the beginning and end of the loop respectively.  This is as expected per the standard semantics with LOCAL_INIT and DO CONCURRENT.
C:\temp>ifort /free /standard-semantics /Qopenmp p.f
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.9.0 Build 20230302_000000
Copyright (C) 1985-2023 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.34.31937.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
-defaultlib:libiomp5md.lib
-nodefaultlib:vcomp.lib
-nodefaultlib:vcompd.lib
p.obj

C:\temp>p.exe
State         i              a
Begin loop    1              5
Begin loop    5              5
End loop      1              10
Begin loop    4              5
Begin loop    3              5
End loop      4              10
End loop      5              10
Begin loop    2              5
End loop      3              10
End loop      2              10
  • Execution of the program without using /Qopenmp: note the erroneous warning following the wrong output due to the standard semantics toward LOCAL_INIT being ignored in this case.  Note the LOCAL_INIT semantics shall be as though the processor has brought into play a BLOCK construct with a local object, as shown above.
C:\temp>ifort /free /standard-semantics p.f
Intel(R) Fortran Intel(R) 64 Compiler Classic for applications running on Intel(R) 64, Version 2021.9.0 Build 20230302_000000
Copyright (C) 1985-2023 Intel Corporation.  All rights reserved.

p.f(5): warning #5423: Locality information is ignored without one of these command line qualifiers '/Qopenmp or Qparallel'
   do concurrent(i = 1:5) local_init(a)
---^
Microsoft (R) Incremental Linker Version 14.34.31937.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\temp>p.exe
State         i              a
Begin loop    1              5
End loop      1              10
Begin loop    2              10
End loop      2              20
Begin loop    3              20
End loop      3              40
Begin loop    4              40
End loop      4              80
Begin loop    5              80
End loop      5              160

 

Ron_Green
Moderator
1,403 Views

My mistake for not reading carefully.  I missed the key statement "at the start of each iteration of the loop".  Got it.  Barbara wrote up the bug.  A case where too many years of OpenMP programming has clouded my judgement. 

0 Kudos
JohnNichols
Valued Contributor III
1,370 Views
!  Console2.f90 
!
!  FUNCTIONS:
!  Console2 - Entry point of console application.
!

!****************************************************************************
!
!  PROGRAM: Console2
!
!  PURPOSE:  Entry point for the console application.
!
!****************************************************************************

    program Console2
    
    implicit none
   
    Integer M,N
    Integer ID(12,20), ID1(12),I,J
    
    do 1 M = 1, 1000
    
    call Check(M)
    
1 end do    

    end program Console2

    subroutine Check(I)
    implicit none
    
    integer I 
    
    do 1 I = 1,10
    
     write(*,*)I
     
1 end do 

    return 
    end subroutine

It never ends. This little monster caught me yesterday, of course intent would solve the problem.  

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,314 Views

FF - Thanks for your complete description of the quirk (misbehavior) of the compiler with respect to DO CONCURRENT together without openmp/parallel option.

At least the compiler emits a warning. Unfortunately, the consequences effect the concurrent loop internal evaluations as opposed to just the exit value of the local_init variable.

Intel, note, while do concurrent is "intended" to be used with parallelization, it is not unusual to debug code without parallelization (parallel code with one thread is not quite the same as without parallelization).

Jim Dempsey

0 Kudos
Reply