- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

The following code seems to be valid (despite the complex rules for do concurrent, see https://github.com/j3-fortran/fortran_proposals/issues/62). It gives invalid results (all zeros for combined forward-backward conversion if compiled with

`ifort -qopenmp`

I'm using ifort version 19.1.2.254

```
program test_do_concurrent
use, intrinsic :: IEEE_arithmetic
implicit none
integer, parameter :: pReal = IEEE_selected_real_kind(15,307)
real(pReal), parameter :: PREAL_EPSILON = epsilon(0.0_pReal)
integer, dimension (2,6), parameter :: &
MAPNYE = reshape([&
1,1, &
2,2, &
3,3, &
1,2, &
2,3, &
1,3 &
],shape(MAPNYE)) !< arrangement in Nye notation.
real(pReal), dimension(*), parameter :: &
NRMMANDEL = [1.0_pReal, 1.0_pReal,1.0_pReal, sqrt(2.0_pReal), sqrt(2.0_pReal), sqrt(2.0_pReal)] !< forward weighting for Mandel notation
real(pReal), dimension(*), parameter :: &
INVNRMMANDEL = 1.0_pReal/NRMMANDEL !< backward weighting for Mandel notation
real(pReal), dimension(6,6) :: t66
real(pReal), dimension(3,3,3,3) :: m3333
call random_number(t66)
if(any(dNeq(math_sym3333to66(math_66toSym3333(t66)),t66))) then
print*, 'original', t66
m3333 = math_66toSym3333(t66)
print*, 'forward', m3333
print*, 'backward', math_sym3333to66(m3333)
print*, 'forward-backward', math_sym3333to66(math_66toSym3333(t66))
error stop 'math_sym3333to66/math_66toSym3333'
endif
contains
!--------------------------------------------------------------------------------------------------
!> @brief convert symmetric 3x3x3x3 matrix into 6x6 matrix
!> @details Weighted conversion (default) rearranges according to Nye and weights shear
! components according to Mandel. Advisable for matrix operations.
! Unweighted conversion only rearranges order according to Nye
!--------------------------------------------------------------------------------------------------
pure function math_sym3333to66(m3333,weighted)
real(pReal), dimension(6,6) :: math_sym3333to66
real(pReal), dimension(3,3,3,3), intent(in) :: m3333 !< symmetric 3x3x3x3 matrix (no internal check)
logical, optional, intent(in) :: weighted !< weight according to Mandel (.true. by default)
real(pReal), dimension(6) :: w
integer :: i,j
if(present(weighted)) then
w = merge(NRMMANDEL,1.0_pReal,weighted)
else
w = NRMMANDEL
endif
do concurrent(i=1:6, j=1:6)
math_sym3333to66(i,j) = w(i)*w(j)*m3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j))
enddo
end function math_sym3333to66
!--------------------------------------------------------------------------------------------------
!> @brief convert 66 matrix into symmetric 3x3x3x3 matrix
!> @details Weighted conversion (default) rearranges according to Nye and weights shear
! components according to Mandel. Advisable for matrix operations.
! Unweighted conversion only rearranges order according to Nye
!--------------------------------------------------------------------------------------------------
pure function math_66toSym3333(m66,weighted)
real(pReal), dimension(3,3,3,3) :: math_66toSym3333
real(pReal), dimension(6,6), intent(in) :: m66 !< 6x6 matrix
logical, optional, intent(in) :: weighted !< weight according to Mandel (.true. by default)
real(pReal), dimension(6) :: w
integer :: i,j
if(present(weighted)) then
w = merge(INVNRMMANDEL,1.0_pReal,weighted)
else
w = INVNRMMANDEL
endif
do i=1,6; do j=1,6
math_66toSym3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j)) = w(i)*w(j)*m66(i,j)
math_66toSym3333(MAPNYE(2,i),MAPNYE(1,i),MAPNYE(1,j),MAPNYE(2,j)) = w(i)*w(j)*m66(i,j)
math_66toSym3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(2,j),MAPNYE(1,j)) = w(i)*w(j)*m66(i,j)
math_66toSym3333(MAPNYE(2,i),MAPNYE(1,i),MAPNYE(2,j),MAPNYE(1,j)) = w(i)*w(j)*m66(i,j)
enddo; enddo
end function math_66toSym3333
!--------------------------------------------------------------------------------------------------
!> @brief Test floating point numbers with double precision for equality.
! replaces "==" but for certain (relative) tolerance. Counterpart to dNeq
! https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
! AlmostEqualRelative
!--------------------------------------------------------------------------------------------------
logical elemental pure function dEq(a,b,tol)
real(pReal), intent(in) :: a,b
real(pReal), intent(in), optional :: tol
if (present(tol)) then
dEq = abs(a-b) <= tol
else
dEq = abs(a-b) <= PREAL_EPSILON * maxval(abs([a,b]))
endif
end function dEq
!--------------------------------------------------------------------------------------------------
!> @brief Test floating point numbers with double precision for inequality.
! replaces "!=" but for certain (relative) tolerance. Counterpart to dEq
! https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
! AlmostEqualRelative NOT
!--------------------------------------------------------------------------------------------------
logical elemental pure function dNeq(a,b,tol)
real(pReal), intent(in) :: a,b
real(pReal), intent(in), optional :: tol
dNeq = .not. dEq(a,b,tol)
end function dNeq
end program test_do_concurrent
```

There are similar bugs reported in https://community.intel.com/t5/Intel-Fortran-Compiler/Bug-with-do-concurrent-and-openmp/m-p/1173473, but since there is no answer from Intel and this bug seems to be a little bit different, I've posted it here again.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

In reviewing some cases, I found that your reproducer runs as expected with ifx, 2022.0.0. It's part of the oneAPI HPC Toolkit 2022.1.

Have you tried compiling with ifx?

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Can you please take a few minutes to simply this reproducer even further? Please provide the output you get when you run it, too.

Thank you!

P.S. It seems to work with ifx (beta) 2021.3.0 using -qopenmp.

As a workaround with ifort 2021.3.0, compile with -O0 or -O1 and -qopenmp.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Dear Barbara,

may thanks for your help. Here is a smaller reproducer with comments that indicate the correct behavior.

```
program test_do_concurrent
use, intrinsic :: IEEE_arithmetic
implicit none
integer, parameter :: pReal = IEEE_selected_real_kind(15,307)
integer, dimension (2,6), parameter :: &
MAPNYE = reshape([&
1,1, &
2,2, &
3,3, &
1,2, &
2,3, &
1,3 &
],shape(MAPNYE)) !< arrangement in Nye notation.
real(pReal), dimension(*), parameter :: &
NRMMANDEL = [1.0_pReal, 1.0_pReal,1.0_pReal, sqrt(2.0_pReal), sqrt(2.0_pReal), sqrt(2.0_pReal)] !< forward weighting for Mandel notation
real(pReal), dimension(*), parameter :: &
INVNRMMANDEL = 1.0_pReal/NRMMANDEL !< backward weighting for Mandel notation
real(pReal), dimension(6,6) :: t66
real(pReal), dimension(3,3,3,3) :: m3333
call random_number(t66)
print*, 'original', t66
m3333 = math_66toSym3333(t66)
print*, 'forward', m3333
print*, 'backward', math_sym3333to66(m3333)
print*, 'delta backward', math_sym3333to66(m3333) - t66
print*, 'forward-backward', math_sym3333to66(math_66toSym3333(t66)) ! should be approx t66, is 0.0
print*, 'delta forward-backward', math_sym3333to66(math_66toSym3333(t66)) - t66 ! should be approx 0
contains
!--------------------------------------------------------------------------------------------------
!> @brief convert symmetric 3x3x3x3 matrix into 6x6 matrix
!> @details Weighted conversion (default) rearranges according to Nye and weights shear
! components according to Mandel. Advisable for matrix operations.
! Unweighted conversion only rearranges order according to Nye
!--------------------------------------------------------------------------------------------------
pure function math_sym3333to66(m3333)
real(pReal), dimension(6,6) :: math_sym3333to66
real(pReal), dimension(3,3,3,3), intent(in) :: m3333 !< symmetric 3x3x3x3 matrix (no internal check)
integer :: i,j
do concurrent(i=1:6, j=1:6)
math_sym3333to66(i,j) = NRMMANDEL(i)*NRMMANDEL(j)*m3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j))
enddo
end function math_sym3333to66
!--------------------------------------------------------------------------------------------------
!> @brief convert 66 matrix into symmetric 3x3x3x3 matrix
!> @details Weighted conversion (default) rearranges according to Nye and weights shear
! components according to Mandel.
!--------------------------------------------------------------------------------------------------
pure function math_66toSym3333(m66)
real(pReal), dimension(3,3,3,3) :: math_66toSym3333
real(pReal), dimension(6,6), intent(in) :: m66 !< 6x6 matrix
integer :: i,j
do i=1,6; do j=1,6
math_66toSym3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j)) = INVNRMMANDEL(i)*INVNRMMANDEL(j)*m66(i,j)
math_66toSym3333(MAPNYE(2,i),MAPNYE(1,i),MAPNYE(1,j),MAPNYE(2,j)) = INVNRMMANDEL(i)*INVNRMMANDEL(j)*m66(i,j)
math_66toSym3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(2,j),MAPNYE(1,j)) = INVNRMMANDEL(i)*INVNRMMANDEL(j)*m66(i,j)
math_66toSym3333(MAPNYE(2,i),MAPNYE(1,i),MAPNYE(2,j),MAPNYE(1,j)) = INVNRMMANDEL(i)*INVNRMMANDEL(j)*m66(i,j)
enddo; enddo
end function math_66toSym3333
end program test_do_concurrent
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thanks for the cutdown version. That's simplifies things. I'll investigate some more.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I checked the ifort optimization report. The compiler did not parallelize that DO CONCURRENT loop. The ifx optimization reporting is not complete yet.

The Fortran Developer Guide and Reference states:

*The following are rules for variables with unspecified locality in DO CONCURRENT constructs:*

*A variable that is referenced in an iteration must be previously defined during that iteration, or it must not be defined or become undefined during any other iteration.*

I changed the DO CONCURRENT loop to

do concurrent(i=1:6, j=1:6) shared(math_sym3333to66, NRMMANDEL, m3333, MAPNYE)

! do concurrent(i=1:6, j=1:6)

math_sym3333to66(i,j) = NRMMANDEL(i)*NRMMANDEL(j)*m3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j))

enddo

and it prints the correct answers.

AND the optimization report says the loop was parallelized.

I used ifort 2021.3.0.

There certainly are a lot of restraints around using DO CONCURRENT!

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

thanks. But according to my understanding of the standard, the *shared *locality is not needed:

*math_sym3333to66(i,j)*is defined only once (the variable is*math_sym3333to66(i,j)*, not*math_sym3333to66*)*NRMMANDEL*,*MAPNYE*, and*m3333*are not defined at all (they even have the parameter/intent(in) attribute.

So I don't think that the stated rule is violated without the *shared* locality clause.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I agree that you shouldn't need to add a locality clause here, but it's good practice to do so, now that Fortran 2018 provides the facility.

We had a similar issue with do concurrent and optimisation (support request 04643307) which I requested was closed because of this workaround.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I don't think it is good practice to introduce extra code to work around broken compilers. I will try *do concurrent* again once ifx is production ready. ifort has long lasting issues with it, I don't feel comfortable to use *do concurrent* in production codes.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

My bad on declaring the PARAMETERS as SHARED. You are right about that.

I checked with one of the Intel Fortran developers who also is a member of the Fortran Standards committee. He replied:

*I believe that m3333 needs to be declared shared. **I say this because 11.1.7.5 p1 Additional semantics for DO CONCURRENT constructs (page 184 lines 28-31) states:*

*The locality of a variable that appears in a DO CONCURRENT construct is LOCAL, LOCAL_INIT, SHARED, or unspecified. A construct or statement entity of a construct or statement within the DO CONCURRENT construct has SHARED locality if it has the SAVE attribute. If it does not have the SAVE attribute, it is a different entity in each iteration, similar to LOCAL locality.*

*Paragraph 4 (line 12+ page 185) states:*

*If a variable has unspecified locality,*

*If it is referenced in an iteration it shall either be previously defined during that iteration, or shall not be defined or become undefined during any other iteration; if it is defined or becomes undefined by more than one iteration it becomes undefined when the loop terminates;*

I declared m3333 SHARED, reran, and the answers are as expected.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Many thanks. So apparently *m3333 *causes the problem here.

If I understand the meaning of SHARED correctly, it is used to indicate that the value of a variable is shared among different iterations of the loop. I could use it for example as in

```
function math_sym3333to66(m3333)
real(pReal), dimension(6,6) :: math_sym3333to66
real(pReal), dimension(3,3,3,3), intent(in) :: m3333 !< symmetric 3x3x3x3 matrix (no internal check)
integer, dimension(3,3,3,3) :: c3333
integer :: i,j,c
c3333 = 0
c = 0
do concurrent(i=1:6, j=1:6) shared(c3333,c)
math_sym3333to66(i,j) = NRMMANDEL(i)*NRMMANDEL(j)*m3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j))
c3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j)) = c3333(MAPNYE(1,i),MAPNYE(2,i),MAPNYE(1,j),MAPNYE(2,j)) &
+ 1
c = c + 1
enddo
print*, sum(c3333) == c
end function math_sym3333to66
```

Here, c3333 would count how often one particular element of m3333/c3333 is accessed and c counts the total number of read accesses. c certainly needs to have SHARED locality to avoid race conditions when incrementing its value.

Based on this understanding, I am still convinced that my original code with unspecified locality is standard conforming: *m3333* is not defined or becomes undefined in any iteration. There is simply no read access and therefore It has the same value in any iteration. This means that the loop can be executed in any order.

There are a few things to note:

*m3333*has the intent(in) attribute, which ensures that it will not be defined or become undefined during any iteration independently of the actual code in the loop.- The access pattern (defined by
*MAPNYE*) even guarantees that every element from*m3333*is accessed at most once (36 out of 81 elements are selected). Due to*MAPNYE*being a parameter this information is also know at compile time. But I doubt that it matters, any access pattern for**read**access should be fine. - I'm not sure whether c3333 needs to have a SHARED locality clause. It seems to be a corner case for which even
**write**access is acceptable with undefined locality clause since*MAPNYE*guarantees that each element is accessed only once. This information is known at compile time.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Some comments:

*m3333 has the intent(in) attribute, which ensures that it will not be defined or become undefined during any iteration independently of the actual code in the loop.*

m3333 has intent(in) for the function, but not for the DO CONCURRENT. Since m3333 is not SHARED, the values are undetermined in the DO CONCURRENT.

*I'm not sure whether c3333 needs to have a SHARED locality clause. It seems to be a corner case for which even**write**access is acceptable with undefined locality clause since MAPNYE guarantees that each element is accessed only once. This information is known at compile time.*

c3333 should be SHARED because you defined it to be zero outside the DO CONCURRENT. Similar concept to what's happening with m3333.

Now the variable c ... c is just equal to i*j in this case. Why increment it within the loop? C has a cross iteration dependency, it is really a reduction. OpenMP has the reduction concept. I don't see that in DO CONCURRENT.

Also, with ifort DO CONCURRENT is implemented using OpenMP. Wit ifx, it looks like the loops are just unrolled in this case, no OpenMP. That may change in a future release.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

"Since m3333 is not SHARED, the values are undetermined in the DO CONCURRENT."

How do you come to that conclusion, i.e. which part of the standard says that this happens?

I'll try to explain my reasoning:

I think we agree that m3333 has unspecified locality (*11.1.7.5 p1*).

That means:

*If it is referenced in an iteration it shall either be previously defined during that iteration, or shall not be defined or become undefined during any other iteration; if it is defined or becomes undefined by more than one iteration it becomes undefined when the loop terminates;*

Which gives two options when referencing a variable:

- "
*it shall either be previously defined during that iteration*". This is similar to a LOCAL variable as noted in (and LOCAL_INIT and LOCAL would behave the same*11.1.7.5 p1)*`real :: a real, dimension(2,2) :: b integer :: i,j do concurrent(i=1:2, j=1:2) a = real(i*j) b(i,j) = a enddo`

the obvious consequence is that it becomes undefined when the loop terminates:

*if it is defined or becomes undefined by more than one iteration it becomes undefined when the loop terminates.* -
"[it]

*shall not be defined or become undefined during any other iteration*".a) Here, LOCAL and LOCAL_INIT would behave differently. ifort's behavior for unspecified is that of LOCAL.

`real :: a real, dimension(2,2) :: b integer :: i,j a = 5. do concurrent(i=1:2, j=1:2) b(i,j) = a enddo`

This is my case.

b) The same condition can be fulfilled with an if statement. Then, there is no difference between LOCAL and LOCAL_INIT.

`real :: a real, dimension(2,2) :: b integer :: i,j do concurrent(i=1:2, j=1:2) if (i==1 .and. j==2) then a = 3. b(i,j) = a else b(i,j) = 5. endif enddo`

in the last two cases, a becomes defined not at all or once, so it does not become undefined after the loop terminates.

So summarize: It seems that ifort handles "unspecified" like LOCAL (11.1.7.5, p2-1: "if a variable with LOCAL locality is a pointer it has undefined pointer association status, and otherwise it is undefined except for any subobjects that are default-initialized"), but there are differences between unspecified and LOCAL. It is only stated that "it is a different entity in each iteration, similar to LOCAL locality." Handling unspecified like LOCAL_INIT seems to be the correct behavior.

Unrelated to the actual discussion, but many thanks for the help and the hint to openMP reduction:

"Now the variable c ... c is just equal to i*j in this case. Why increment it within the loop? C has a cross iteration dependency, it is really a reduction. OpenMP has the reduction concept. I don't see that in DO CONCURRENT."

I agree, my code is invalid. 11.1.7.5 p3 states that "If a variable has SHARED locality ... [and] if it is defined or becomes undefined during any iteration, it shall not be referenced, defined, or become undefined during any other iteration"

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

m3333 is on the right-hand side. It is not defined during any DO CURRENT iteration.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

whether a variable is on the left- or right-hand side is not mentioned in the standard at all so I don't see why it should matter.

Let's look at the Note 4 from the standard:

```
INTEGER :: A(N),IND(N)
...
DO CONCURRENT (I=1:M)
A(IND(I)) = I
END DO
```

IND is never defined in any iteration, still it is a valid DO CONCURRENT loop.

If your argument about left- or right-hand side is valid, we could simply write

```
INTEGER :: A(N),IND(N), T
...
DO CONCURRENT (I=1:M) LOCAL(T)
T = IND(I)
A(T) = I
END DO
```

There is no reason to assume that IND is undefined because it does not have LOCAL locality.

Note that here "the user knows that there are no repeated values in the index array IND", otherwise some components of A would become undefined because they would be defined in more than one iteration.

It is nowhere stated that a variable with unspecified locality to be initialized in an iteration. That holds only for variables with LOCAL locality. C1128 explicitly states that a variable with INTENT(IN) attribute (that is the case in my original code) cannot have LOCAL or LOCAL_INIT locality.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Many thanks for the help so far. Since this topic is of broader scope than ifort alone, I've summarized the findings on https://fortran-lang.discourse.group/t/clarification-on-do-concurrent/1647

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

I added my thoughts at https://fortran-lang.discourse.group/t/clarification-on-do-concurrent/1647/3 - I think ifort is behaving incorrectly here.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Reading and interpreting a compiler standard takes a special kind of talent! I filed a bug report CMPLRIL0-34101 on this DO CONCURRENT issue. I'll let you know how this progresses to a fix.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

@Barbara_P_Intel: Many thanks for your support and the constructive discussion. Let's hope that the behavior can be easily fixed.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

In reviewing some cases, I found that your reproducer runs as expected with ifx, 2022.0.0. It's part of the oneAPI HPC Toolkit 2022.1.

Have you tried compiling with ifx?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

yes, works here. Many thanks

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page