Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Strange OpenMP Data Race Issue

Seyhan
Beginner
465 Views

I have the following simple code that works perfectly fine with gfortran. For some reason, I get a data race for control%value when it is compiled with ifx or ifort (all the threads have value 3 assigned to control%value). The integer control%n is fine, strangely. When I remove the copyin, things are working correctly but I do need it in some parts of the real/more complicated code to broadcast the main thread's value to all the threads. I compile with -openmp option. Any help is greatly appreciated as I have been at it for more than a day now.

 

program main
use derivedtypes
use omp_lib
implicit none

integer(kind=4) :: id

call omp_set_num_threads(8)

call input

!$OMP PARALLEL copyin(control)
!$OMP END PARALLEL

!$OMP PARALLEL private(id)
id = omp_get_thread_num()

print *, id, control%value, control%n

if (id == 3) then
control%value = 3d0
control%n = 2
end if
!$OMP END PARALLEL

print *, '========================'

!$OMP PARALLEL private(id)
id = omp_get_thread_num()

print *, id, control%value, control%n

!$OMP END PARALLEL

end program main




subroutine input
use derivedtypes
implicit none

!$OMP PARALLEL
control%n = 1
if (allocated(control%value)) deallocate(control%value)
allocate(control%value(control%n))
control%value(1) = 5d0

!$OMP END PARALLEL

end subroutine input




module derivedtypes
implicit none

type :: controlderived
integer(kind=4) :: n
real(kind=8), allocatable, dimension(:) :: value
end type controlderived

type(controlderived), target :: control

!$OMP THREADPRIVATE(control)

end module derivedtypes

 

Labels (1)
0 Kudos
6 Replies
jimdempseyatthecove
Honored Contributor III
357 Views

Maybe your sketch code is not a correct representation of your application.

In your sketch, the UDT control is threadprivate. Therefore, the copyin(control) could be problematic.

It is ambiguous as to what copyin(control) does in this case.  Does it make a new (stack local) copy of the  threadprivate UDT variable control (of the master thread in this case) along with a copy of its allocated data data? IOW, in the context of that parallel region, is control implicitly stack local to each of the threads? Then at the end of the parallel region with copyin(control) are you expecting that local copy to be returned to the threadprivate control (of the master thread in this case) outside the parallel region?

 

Jim Dempsey

0 Kudos
Seyhan
Beginner
332 Views

Copyin is just something I need in the real program; I don't actually need it in this simple program, but I still expect everything to work properly. Maybe I am misunderstanding what copyin actually does. Do you mean, copyin is problematic with thread private global variables?

0 Kudos
jimdempseyatthecove
Honored Contributor III
252 Views

Not seeing your code, it may have been presumptuous of me to comment on your sketch.

The copyin clause says in effect:

From the context of the running thread encountering the parallel region ...

... copy it's listed threadprivate variables to all other threads, threadprivate variables of the same name(s).

IOW initialize the listed items to that of the encountering thread (the master thread which may or may not be at a nest level).

This may make sense in the following sketch

program
call init(conrol) ! sequential process initializing the (threadprivate) control of the master thread
! broadcast master thread's contol to other threads on the following region
!$omp parallel copyin(control)
... ! each thread may modify its control
!$omp end parallel
...
!$omp parallel
... ! using the updated copy of its control from prior parallel region
!$omp end parallel
...
end program

 

 

 

 

 

 

 

0 Kudos
Seyhan
Beginner
171 Views

Thank you for your reply. Do you mean, something like this should work:

 

program main
use derivedtypes
use omp_lib
implicit none

integer(kind=4) :: id

call omp_set_num_threads(4)

call input

! Copy threadprivate control derived type of the master thread to all other threads:
!$OMP PARALLEL copyin(control)
!$OMP END PARALLEL

!$OMP PARALLEL private(id)
id = omp_get_thread_num()

print *, id, control%value, control%temp, control%n

if (id == 1) then
control%value = 3.5d0
control%temp = 3.5d0
control%n = 2
end if
!$OMP END PARALLEL

print *, '========================'

!$OMP PARALLEL private(id)
id = omp_get_thread_num()

print *, id, control%value, control%temp, control%n

!$OMP END PARALLEL

end program main

subroutine input
use derivedtypes
implicit none

control%n = 1
if (allocated(control%value)) deallocate(control%value)
allocate(control%value(control%n))
control%value(1) = 5d0
control%temp = 5d0

end subroutine input

Unfortunately, I still have the same issue with control%value. control%temp and control%n are behaving correctly. 

0 Kudos
Seyhan
Beginner
341 Views

Edited original post to avoid confusion regarding copyin.

0 Kudos
jimdempseyatthecove
Honored Contributor III
154 Views

Using Intel Fortran 2025.0 (ifx), Debug build, and modifying your original control derived to include real(kind=8) :: temp

The output appears to be incorrect:

           0   5.00000000000000        5.00000000000000                1
           1   5.00000000000000        5.00000000000000                1
           2   3.50000000000000        5.00000000000000                1
           3   3.50000000000000        5.00000000000000                1
 ========================
           0   3.50000000000000        5.00000000000000                1
           1   3.50000000000000        3.50000000000000                2
           2   3.50000000000000        5.00000000000000                1
           3   3.50000000000000        5.00000000000000                1

Initial output above ==='s should have had all 5.0's in left column. This may occur if there was no implicit barrier between the first parallel region and the second parallel region.

Changing the 1st parallel region to:

!$OMP PARALLEL copyin(control)
print *, omp_get_thread_num(), control%value, control%temp, control%n
!$OMP END PARALLEL

 

           3   5.00000000000000        5.00000000000000                1
           0   5.00000000000000        5.00000000000000                1
           2   5.00000000000000        5.00000000000000                1
           1   5.00000000000000        5.00000000000000                1
           0   5.00000000000000        5.00000000000000                1
           3   5.00000000000000        5.00000000000000                1
           2   5.00000000000000        5.00000000000000                1
           1   5.00000000000000        5.00000000000000                1
 ========================
           0   3.50000000000000        5.00000000000000                1
           3   3.50000000000000        5.00000000000000                1
           1   3.50000000000000        3.50000000000000                2
           2   3.50000000000000        5.00000000000000                1

This makes the 1st and 2nd parallel region display the expected results (whether by accident or design).

Removing the above edit, and editing the 3rd parallel region to have:

print *, id, control%value, control%temp, control%n, loc(control%value)

This shows the problem:

           0   5.00000000000000        5.00000000000000                1
           1   5.00000000000000        5.00000000000000                1
           2   3.50000000000000        5.00000000000000                1
           3   3.50000000000000        5.00000000000000                1
 ========================
           3   3.50000000000000        5.00000000000000                1
         2236806605840
           2   3.50000000000000        5.00000000000000                1
         2236806605840
           1   3.50000000000000        3.50000000000000                2
         2236806605840
           0   3.50000000000000        5.00000000000000                1
         2236806605840

control%value is an allocatable, the copyin(control) should have performed a new allocation for each of the copies (and copied the original value from the master thread).

This is a bug

Using ifort shows the same problem.

jimdempseyatthecove_0-1733176706624.png

The behavior is as if the array descriptor of the master thread's control%value is copied as-is.

IOW the data of each thread's thread private array value are not copies, the reference the same memory locations.

Editing the code

...

call input  ! each thread allocating its control%value

!$OMP PARALLEL
print *, omp_get_thread_num(), allocated(control%value)
!$OMP END PARALLEL
...

We see correctly

0 T
2 F
1 F
3 F
0 5.00000000000000 5.00000000000000 1
3 5.00000000000000 5.00000000000000 1
1 5.00000000000000 5.00000000000000 1
2 3.50000000000000 5.00000000000000 1
========================
0 3.50000000000000 5.00000000000000 1 2385607530144
3 3.50000000000000 5.00000000000000 1 2385607530144
2 3.50000000000000 5.00000000000000 1 2385607530144
1 3.50000000000000 3.50000000000000 2 2385607530144

that the other threads control%value arrays have not been allocated, yet the copyin(value) is not performing as an intrinsic assignment:

myThreadprivate::control%value = masterThread::control%value

 

fwiw, the way it is working now is IMHO not correct, it is dangerous as inside a parallel region, any thread could issue

deallocate(control%value)
.or.
control%value = differentSizeArray

both cases would corrupt the heap.

 

Jim Dempsey

 

 

0 Kudos
Reply