Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28381 Discussions

do concurrent broken with openmp

Diehl__Martin
Novice
2,957 Views

The following code seems to be valid (despite the complex rules for do concurrent, see https://github.com/j3-fortran/fortran_proposals/issues/62). It gives invalid results (all zeros for combined forward-backward conversion if compiled with

ifort -qopenmp

I'm using ifort version 19.1.2.254

 

program test_do_concurrent
  use, intrinsic :: IEEE_arithmetic

  implicit none
  integer, parameter :: pReal = IEEE_selected_real_kind(15,307)
  real(pReal), parameter :: PREAL_EPSILON = epsilon(0.0_pReal)
  
  integer, dimension (2,6), parameter :: &
    MAPNYE = reshape([&
      1,1, &
      2,2, &
      3,3, &
      1,2, &
      2,3, &
      1,3  &
      ],shape(MAPNYE))                                                                              !< arrangement in Nye notation.
  
  real(pReal), dimension(*), parameter :: &
    NRMMANDEL = [1.0_pReal, 1.0_pReal,1.0_pReal, sqrt(2.0_pReal), sqrt(2.0_pReal), sqrt(2.0_pReal)] !< forward weighting for Mandel notation
  real(pReal), dimension(*), parameter :: &
    INVNRMMANDEL = 1.0_pReal/NRMMANDEL                                                              !< backward weighting for Mandel notation

  real(pReal), dimension(6,6) :: t66
  real(pReal), dimension(3,3,3,3) :: m3333 
  call random_number(t66)
  
  if(any(dNeq(math_sym3333to66(math_66toSym3333(t66)),t66))) then
    print*, 'original', t66
    m3333 = math_66toSym3333(t66)
    print*, 'forward', m3333
    print*, 'backward', math_sym3333to66(m3333)
    print*, 'forward-backward', math_sym3333to66(math_66toSym3333(t66))
    error stop 'math_sym3333to66/math_66toSym3333'
  endif



contains
!--------------------------------------------------------------------------------------------------
!> @brief convert symmetric 3x3x3x3 matrix into 6x6 matrix
!> @details Weighted conversion (default) rearranges according to Nye and weights shear
! components according to Mandel. Advisable for matrix operations.
! Unweighted conversion only rearranges order according to Nye
!--------------------------------------------------------------------------------------------------
pure function math_sym3333to66(m3333,weighted)

  real(pReal), dimension(6,6)                 :: math_sym3333to66
  real(pReal), dimension(3,3,3,3), intent(in) :: m3333                                              !< symmetric 3x3x3x3 matrix (no internal check)
  logical,     optional,           intent(in) :: weighted                                           !< weight according to Mandel (.true. by default)

  real(pReal), dimension(6) :: w
  integer :: i,j


  if(present(weighted)) then
    w = merge(NRMMANDEL,1.0_pReal,weighted)
  else
    w = NRMMANDEL
  endif

  do concurrent(i=1:6, j=1:6)
    math_sym3333to66(i,j) = w(i)*w(j)*m3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j))
  enddo

end function math_sym3333to66

!--------------------------------------------------------------------------------------------------
!> @brief convert 66 matrix into symmetric 3x3x3x3 matrix
!> @details Weighted conversion (default) rearranges according to Nye and weights shear
! components according to Mandel. Advisable for matrix operations.
! Unweighted conversion only rearranges order according to Nye
!--------------------------------------------------------------------------------------------------
pure function math_66toSym3333(m66,weighted)

  real(pReal), dimension(3,3,3,3)            :: math_66toSym3333
  real(pReal), dimension(6,6),    intent(in) :: m66                                                 !< 6x6 matrix
  logical,     optional,          intent(in) :: weighted                                            !< weight according to Mandel (.true. by default)

  real(pReal), dimension(6) :: w
  integer :: i,j


  if(present(weighted)) then
    w = merge(INVNRMMANDEL,1.0_pReal,weighted)
  else
    w = INVNRMMANDEL
  endif

  do i=1,6; do j=1,6
    math_66toSym3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j)) = w(i)*w(j)*m66(i,j)
    math_66toSym3333(MAPNYE(2,i),MAPNYE(1,i),MAPNYE(1,j),MAPNYE(2,j)) = w(i)*w(j)*m66(i,j)
    math_66toSym3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(2,j),MAPNYE(1,j)) = w(i)*w(j)*m66(i,j)
    math_66toSym3333(MAPNYE(2,i),MAPNYE(1,i),MAPNYE(2,j),MAPNYE(1,j)) = w(i)*w(j)*m66(i,j)
  enddo; enddo

end function math_66toSym3333


!--------------------------------------------------------------------------------------------------
!> @brief Test floating point numbers with double precision for equality.
! replaces "==" but for certain (relative) tolerance. Counterpart to dNeq
! https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
! AlmostEqualRelative
!--------------------------------------------------------------------------------------------------
logical elemental pure function dEq(a,b,tol)

  real(pReal), intent(in)           :: a,b
  real(pReal), intent(in), optional :: tol


  if (present(tol)) then
    dEq = abs(a-b) <= tol
  else
    dEq = abs(a-b) <= PREAL_EPSILON * maxval(abs([a,b]))
  endif
end function dEq


!--------------------------------------------------------------------------------------------------
!> @brief Test floating point numbers with double precision for inequality.
! replaces "!=" but for certain (relative) tolerance. Counterpart to dEq
! https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
! AlmostEqualRelative NOT
!--------------------------------------------------------------------------------------------------
logical elemental pure function dNeq(a,b,tol)

  real(pReal), intent(in)           :: a,b
  real(pReal), intent(in), optional :: tol


  dNeq = .not. dEq(a,b,tol)

end function dNeq


end program test_do_concurrent

 

There are similar bugs reported in https://community.intel.com/t5/Intel-Fortran-Compiler/Bug-with-do-concurrent-and-openmp/m-p/1173473, but since there is no answer from Intel and this bug seems to be a little bit different, I've posted it here again.

1 Solution
Barbara_P_Intel
Moderator
1,779 Views

In reviewing some cases, I found that your reproducer runs as expected with ifx, 2022.0.0. It's part of the oneAPI HPC Toolkit 2022.1.

Have you tried compiling with ifx?



View solution in original post

0 Kudos
19 Replies
Barbara_P_Intel
Moderator
2,911 Views

Can you please take a few minutes to simply this reproducer even further?  Please provide the output you get when you run it, too.  

Thank you!

 

P.S.  It seems to work with ifx (beta) 2021.3.0 using -qopenmp.

As a workaround with ifort 2021.3.0, compile with -O0 or -O1 and -qopenmp.

0 Kudos
Diehl__Martin
Novice
2,885 Views

Dear Barbara,

 

may thanks for your help. Here is a smaller reproducer with comments that indicate the correct behavior.

 

program test_do_concurrent
  use, intrinsic :: IEEE_arithmetic

  implicit none
  integer, parameter :: pReal = IEEE_selected_real_kind(15,307)

  integer, dimension (2,6), parameter :: &
    MAPNYE = reshape([&
      1,1, &
      2,2, &
      3,3, &
      1,2, &
      2,3, &
      1,3  &
      ],shape(MAPNYE))                                                                              !< arrangement in Nye notation.

  real(pReal), dimension(*), parameter :: &
    NRMMANDEL = [1.0_pReal, 1.0_pReal,1.0_pReal, sqrt(2.0_pReal), sqrt(2.0_pReal), sqrt(2.0_pReal)] !< forward weighting for Mandel notation
  real(pReal), dimension(*), parameter :: &
    INVNRMMANDEL = 1.0_pReal/NRMMANDEL                                                              !< backward weighting for Mandel notation

  real(pReal), dimension(6,6) :: t66
  real(pReal), dimension(3,3,3,3) :: m3333
  call random_number(t66)

  print*, 'original', t66
  m3333 = math_66toSym3333(t66)
  print*, 'forward', m3333
  print*, 'backward', math_sym3333to66(m3333)
  print*, 'delta backward', math_sym3333to66(m3333) - t66
  print*, 'forward-backward', math_sym3333to66(math_66toSym3333(t66)) ! should be approx t66, is 0.0
  print*, 'delta forward-backward', math_sym3333to66(math_66toSym3333(t66)) - t66 ! should be approx 0


contains
!--------------------------------------------------------------------------------------------------
!> @brief convert symmetric 3x3x3x3 matrix into 6x6 matrix
!> @details Weighted conversion (default) rearranges according to Nye and weights shear
! components according to Mandel. Advisable for matrix operations.
! Unweighted conversion only rearranges order according to Nye
!--------------------------------------------------------------------------------------------------
pure function math_sym3333to66(m3333)

  real(pReal), dimension(6,6)                 :: math_sym3333to66
  real(pReal), dimension(3,3,3,3), intent(in) :: m3333                                              !< symmetric 3x3x3x3 matrix (no internal check)

  integer :: i,j


  do concurrent(i=1:6, j=1:6)
    math_sym3333to66(i,j) = NRMMANDEL(i)*NRMMANDEL(j)*m3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j))
  enddo

end function math_sym3333to66

!--------------------------------------------------------------------------------------------------
!> @brief convert 66 matrix into symmetric 3x3x3x3 matrix
!> @details Weighted conversion (default) rearranges according to Nye and weights shear
! components according to Mandel.
!--------------------------------------------------------------------------------------------------
pure function math_66toSym3333(m66)

  real(pReal), dimension(3,3,3,3)            :: math_66toSym3333
  real(pReal), dimension(6,6),    intent(in) :: m66                                                 !< 6x6 matrix

  integer :: i,j


  do i=1,6; do j=1,6
    math_66toSym3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j)) = INVNRMMANDEL(i)*INVNRMMANDEL(j)*m66(i,j)
    math_66toSym3333(MAPNYE(2,i),MAPNYE(1,i),MAPNYE(1,j),MAPNYE(2,j)) = INVNRMMANDEL(i)*INVNRMMANDEL(j)*m66(i,j)
    math_66toSym3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(2,j),MAPNYE(1,j)) = INVNRMMANDEL(i)*INVNRMMANDEL(j)*m66(i,j)
    math_66toSym3333(MAPNYE(2,i),MAPNYE(1,i),MAPNYE(2,j),MAPNYE(1,j)) = INVNRMMANDEL(i)*INVNRMMANDEL(j)*m66(i,j)
  enddo; enddo

end function math_66toSym3333


end program test_do_concurrent

 

 

0 Kudos
Barbara_P_Intel
Moderator
2,853 Views

Thanks for the cutdown version. That's simplifies things. I'll investigate some more.



0 Kudos
Barbara_P_Intel
Moderator
2,822 Views

I checked the ifort optimization report. The compiler did not parallelize that DO CONCURRENT loop. The ifx optimization reporting is not complete yet.

The  Fortran Developer Guide and Reference  states:

The following are rules for variables with unspecified locality in DO CONCURRENT constructs:

  • A variable that is referenced in an iteration must be previously defined during that iteration, or it must not be defined or become undefined during any other iteration.

I changed the DO CONCURRENT loop to

do concurrent(i=1:6, j=1:6) shared(math_sym3333to66, NRMMANDEL, m3333, MAPNYE)
! do concurrent(i=1:6, j=1:6)
math_sym3333to66(i,j) = NRMMANDEL(i)*NRMMANDEL(j)*m3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j))
enddo

and it prints the correct answers.

AND the optimization report says the loop was parallelized.

I used ifort 2021.3.0.

There certainly are a lot of restraints around using DO CONCURRENT!

0 Kudos
Diehl__Martin
Novice
2,808 Views

thanks. But according to my understanding of the standard, the shared locality is not needed:

  • math_sym3333to66(i,j) is defined only once (the variable is math_sym3333to66(i,j), not math_sym3333to66)
  • NRMMANDEL, MAPNYE, and m3333 are not defined at all (they even have the parameter/intent(in) attribute.

So I don't think that the stated rule is violated without the shared locality clause.

0 Kudos
Mark_Lewy
Valued Contributor I
2,790 Views

I agree that you shouldn't need to add a locality clause here, but it's good practice to do so, now that Fortran 2018 provides the facility.

 

We had a similar issue with do concurrent and optimisation (support request 04643307) which I requested was closed because of this workaround.

0 Kudos
Diehl__Martin
Novice
2,775 Views

I don't think it is good practice to introduce extra code to work around broken compilers. I will try do concurrent again once ifx is production ready. ifort has long lasting issues with it, I don't feel comfortable to use do concurrent in production codes.

0 Kudos
Barbara_P_Intel
Moderator
2,760 Views

My bad on declaring the PARAMETERS as SHARED. You are right about that.

I checked with one of the Intel Fortran developers who also is a member of the Fortran Standards committee. He replied:

I believe that m3333 needs to be declared shared.  I say this because 11.1.7.5 p1 Additional semantics for DO CONCURRENT constructs (page 184 lines 28-31) states:

The locality of a variable that appears in a DO CONCURRENT construct is LOCAL, LOCAL_INIT, SHARED, or unspecified. A construct or statement entity of a construct or statement within the DO CONCURRENT construct has SHARED locality if it has the SAVE attribute. If it does not have the SAVE attribute, it is a different entity in each iteration, similar to LOCAL locality.

Paragraph 4 (line 12+ page 185) states:

If a variable has unspecified locality,

  • If it is referenced in an iteration it shall either be previously defined during that iteration, or shall not be defined or become undefined during any other iteration; if it is defined or becomes undefined by more than one iteration it becomes undefined when the loop terminates;

I declared m3333 SHARED, reran, and the answers are as expected.

 

0 Kudos
Diehl__Martin
Novice
2,737 Views

Many thanks. So apparently m3333 causes the problem here.

If I understand the meaning of SHARED correctly, it is used to indicate that the value of a variable is shared among different iterations of the loop. I could use it for example as in

function math_sym3333to66(m3333)                                                               

  real(pReal), dimension(6,6)                 :: math_sym3333to66                                   
  real(pReal), dimension(3,3,3,3), intent(in) :: m3333                                              !< symmetric 3x3x3x3 matrix (no internal check)
  integer, dimension(3,3,3,3)                 :: c3333                                              

  integer :: i,j,c

  c3333 = 0
  c = 0
                 
  do concurrent(i=1:6, j=1:6) shared(c3333,c)                                                         
    math_sym3333to66(i,j) = NRMMANDEL(i)*NRMMANDEL(j)*m3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j))
    c3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j)) = c3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j)) &
                                                           + 1
    c = c + 1              
  enddo
  print*, sum(c3333) == c                                                                                          
                                                                                                    
end function math_sym3333to66   

Here, c3333 would count how often one particular element of m3333/c3333 is accessed and c counts the total number of read accesses. c certainly needs to have SHARED locality to avoid race conditions when incrementing its value.

 

Based on this understanding, I am still convinced that my original code with unspecified locality is standard conforming: m3333 is not defined or becomes undefined in any iteration. There is simply no read access and therefore It has the same value in any iteration. This means that the loop can be executed in any order.

There are a few things to note:

  • m3333 has the intent(in) attribute, which ensures that it will not be defined or become undefined during any iteration independently of the actual code in the loop.
  • The access pattern (defined by MAPNYE) even guarantees that every element from m3333 is accessed at most once (36 out of 81 elements are selected). Due to MAPNYE being a parameter this information is also know at compile time. But I doubt that it matters, any access pattern for read access should be fine.
  • I'm not sure whether c3333 needs to have a SHARED locality clause. It seems to be a corner case for which even write access is acceptable with undefined locality clause since MAPNYE guarantees that each element is accessed only once. This information is known at compile time.
0 Kudos
Barbara_P_Intel
Moderator
2,696 Views

Some comments:

  • m3333 has the intent(in) attribute, which ensures that it will not be defined or become undefined during any iteration independently of the actual code in the loop.

m3333 has intent(in) for the function, but not for the DO CONCURRENT. Since m3333 is not SHARED, the values are undetermined in the DO CONCURRENT.


  • I'm not sure whether c3333 needs to have a SHARED locality clause. It seems to be a corner case for which even write access is acceptable with undefined locality clause since MAPNYE guarantees that each element is accessed only once. This information is known at compile time.

c3333 should be SHARED because you defined it to be zero outside the DO CONCURRENT. Similar concept to what's happening with m3333.


Now the variable c ... c is just equal to i*j in this case. Why increment it within the loop? C has a cross iteration dependency, it is really a reduction. OpenMP has the reduction concept. I don't see that in DO CONCURRENT.


Also, with ifort DO CONCURRENT is implemented using OpenMP. Wit ifx, it looks like the loops are just unrolled in this case, no OpenMP. That may change in a future release.



0 Kudos
Diehl__Martin
Novice
2,670 Views

"Since m3333 is not SHARED, the values are undetermined in the DO CONCURRENT."

How do you come to that conclusion, i.e. which part of the standard says that this happens?

 

I'll try to explain my reasoning:

 

I think we agree that m3333 has unspecified locality (11.1.7.5 p1).

That means:

  • If it is referenced in an iteration it shall either be previously defined during that iteration, or shall not be defined or become undefined during any other iteration; if it is defined or becomes undefined by more than one iteration it becomes undefined when the loop terminates;

Which gives two options when referencing a variable:

  1. "it shall either be previously defined during that iteration". This is similar to a LOCAL variable as noted in (11.1.7.5 p1) and LOCAL_INIT and LOCAL would behave the same
    real :: a
    real, dimension(2,2) :: b
    integer :: i,j
    
    do concurrent(i=1:2, j=1:2)
      a = real(i*j)
      b(i,j) = a
    enddo

     the obvious consequence is that it becomes undefined when the loop terminates: if it is defined or becomes undefined by more than one iteration it becomes undefined when the loop terminates.

  2. "[it] shall not be defined or become undefined during any other iteration".

    a) Here, LOCAL and LOCAL_INIT would behave differently. ifort's behavior for unspecified is that of LOCAL.

    real :: a
    real, dimension(2,2) :: b
    integer :: i,j
    
    a = 5.
    
    do concurrent(i=1:2, j=1:2)
      b(i,j) = a
    enddo

    This is my case.

    b)  The same condition can be fulfilled with an if statement. Then, there is no difference between LOCAL and LOCAL_INIT.

    real :: a
    real, dimension(2,2) :: b
    integer :: i,j
    
    do concurrent(i=1:2, j=1:2)
      if (i==1 .and. j==2) then
        a = 3.
        b(i,j) = a
      else
        b(i,j) = 5.
      endif
    enddo​

    in the last two cases, a becomes defined not at all or once, so it does not become undefined after the loop terminates.

 

So summarize: It seems that ifort handles "unspecified" like LOCAL (11.1.7.5, p2-1: "if a variable with LOCAL locality is a pointer it has undefined pointer association status, and otherwise it is undefined except for any subobjects that are default-initialized"), but there are differences between unspecified and LOCAL. It is only stated that "it is a different entity in each iteration, similar to LOCAL locality." Handling unspecified like LOCAL_INIT seems to be the correct behavior.

 

 

Unrelated to the actual discussion, but many thanks for the help and the hint to openMP reduction:

"Now the variable c ... c is just equal to i*j in this case. Why increment it within the loop? C has a cross iteration dependency, it is really a reduction. OpenMP has the reduction concept. I don't see that in DO CONCURRENT."

I agree, my code is invalid. 11.1.7.5 p3 states that  "If a variable has SHARED locality ... [and] if it is defined or becomes undefined during any iteration, it shall not be referenced, defined, or become undefined during any other iteration"

 

0 Kudos
Barbara_P_Intel
Moderator
2,655 Views

m3333 is on the right-hand side. It is not defined during any DO CURRENT iteration.



0 Kudos
Diehl__Martin
Novice
2,637 Views

whether a variable is on the left- or right-hand side is not mentioned in the standard at all so I don't see why it should matter.

 

Let's look at the Note 4 from the standard:

 

INTEGER :: A(N),IND(N)
...
DO CONCURRENT (I=1:M)
  A(IND(I)) = I
END DO

IND is never defined in any iteration, still it is a valid DO CONCURRENT loop.

 

If your argument about left- or right-hand side is valid, we could simply write

INTEGER :: A(N),IND(N), T
...
DO CONCURRENT (I=1:M) LOCAL(T)
  T = IND(I)
  A(T) = I
END DO

There is no reason to assume that IND is undefined because it does not have LOCAL locality.

 

Note that here "the user knows that there are no repeated values in the index array IND", otherwise some components of A would become undefined because they would be defined in more than one iteration.

 

It is nowhere stated that a variable with unspecified locality to be initialized in an iteration. That holds only for variables with LOCAL locality. C1128 explicitly states that a variable with INTENT(IN) attribute (that is the case in my original code) cannot have LOCAL or LOCAL_INIT locality.

 

0 Kudos
Diehl__Martin
Novice
2,614 Views

Many thanks for the help so far. Since this topic is of broader scope than ifort alone, I've summarized the findings on https://fortran-lang.discourse.group/t/clarification-on-do-concurrent/1647

0 Kudos
Steve_Lionel
Honored Contributor III
2,577 Views

I added my thoughts at https://fortran-lang.discourse.group/t/clarification-on-do-concurrent/1647/3 - I think ifort is behaving incorrectly here.

0 Kudos
Barbara_P_Intel
Moderator
2,515 Views

Reading and interpreting a compiler standard takes a special kind of talent! I filed a bug report CMPLRIL0-34101 on this DO CONCURRENT issue. I'll let you know how this progresses to a fix.



Diehl__Martin
Novice
2,500 Views

@Barbara_P_Intel: Many thanks for your support and the constructive discussion. Let's hope that the behavior can be easily fixed.

0 Kudos
Barbara_P_Intel
Moderator
1,780 Views

In reviewing some cases, I found that your reproducer runs as expected with ifx, 2022.0.0. It's part of the oneAPI HPC Toolkit 2022.1.

Have you tried compiling with ifx?



0 Kudos
Diehl__Martin
Novice
1,756 Views

yes, works here. Many thanks

0 Kudos
Reply