Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
26749 Discussions

Openmp in subroutines with automatic arrays

Ulrich_M_
New Contributor I
202 Views

The following code does not yield the correct result with openmp enabled (version 18.0.1.156). Bug?

program test
    implicit none
    real	:: val
    
    call sumarray(10,val)
    print *,val

    contains 
        
    subroutine sumarray(n,val)
        real	:: y(n),val
        integer	:: n,i
            
        y=0.0
!$omp parallel do reduction(+:y)		
        do i=1,100
            y(mod(i,n)+1)=y(mod(i,n)+1)+1
        enddo
        val=sum(y)
    end subroutine
end program

 

 

0 Kudos
6 Replies
TimP
Black Belt
202 Views

As n==10, you have every 10th element of y incremented 10 times.  Unless restricted to 1 thread, there is a race condition, as more than 1 thread could be updating some elements of y.  If you ran one of the parallel checking tools and this didn't raise a warning, that would be a bug.

Array reduction means that each element of an array has its own reduction, with the final assignment performed by one thread.  Even with the expected restrictions, it tends to be inefficient beyond a small number of threads.   Perhaps the restrictions aren't evident in the standard, and I sympathize with not trying to read the source code of an openmp implementation to see whether such a thing is supported.

You could simplify what you have written perhaps using i=1,100,10 (without mod) and then have it work as expected, but it would not depend on the reduction clause.

Ulrich_M_
New Contributor I
202 Views

Thanks for your answer. I don't really understand it, though. With the reduction clause, there shouldn't be any race condition involving updates of y--that is its very raison-d'etre, no? I would have thought that if an array is in the reduction clause, each thread has a private array y, collecting partial sums, and these private y's are then combined into the overall sum on exit of the parallel region. What am I missing?

(The code is obviously useless and inefficient, the point here is only to document an unexpected behavior).

Ulrich_M_
New Contributor I
202 Views

I should add that everything works as expected if the array y in the subroutine has fixed size 10, that is with

    subroutine sumarray(n,val)
        real	:: y(10),val
        integer	:: n,i
            
        y=0.0
!$omp parallel do reduction(+:y)		
        do i=1,100
            y(mod(i,n)+1)=y(mod(i,n)+1)+1
        enddo
        val=sum(y)
    end subroutine

which is further evidence that this is a compiler bug...

Ulrich_M_
New Contributor I
202 Views

In fact, even the following fails to yield the correct result in all optimization levels.

program test
    implicit none
    real	:: val
    
    call sumarray(10,val)
    print *,val

    contains 
        
    subroutine sumarray(n,val)
        real	:: y(n),val
        integer	:: n
            
        y=0.0
!$omp parallel do reduction(+:y)		
        do i=1,100
             y=y+1
        enddo
        val=sum(y)
    end subroutine
end program

I suspect that if an array is automatic with a size that is an argument to a subroutine, the private versions in a reduction clause are not correctly initialized.

I would appreciate if somebody from Intel could weigh in. Thank you.

Martyn_C_Intel
Employee
202 Views

I agree that this issue is a bug. It occurs in compiler versions 18.0.0 and 18.0.1 (and even when running on a single thread).

The good news is that both your examples run correctly with the recently released update 18.0.2  (and also with the earlier version 17 compiler).  Please could you give the latest compiler a try and see if you agree?

Ulrich_M_
New Contributor I
202 Views

Yes, appears fixed in 18.0.2. Thank you for your response!

Reply