- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have the following simple code that works perfectly fine with gfortran. For some reason, I get a data race for control%value when it is compiled with ifx or ifort (all the threads have value 3 assigned to control%value). The integer control%n is fine, strangely. When I remove the copyin, things are working correctly but I do need it in some parts of the real/more complicated code to broadcast the main thread's value to all the threads. I compile with -openmp option. Any help is greatly appreciated as I have been at it for more than a day now.
program main
use derivedtypes
use omp_lib
implicit none
integer(kind=4) :: id
call omp_set_num_threads(8)
call input
!$OMP PARALLEL copyin(control)
!$OMP END PARALLEL
!$OMP PARALLEL private(id)
id = omp_get_thread_num()
print *, id, control%value, control%n
if (id == 3) then
control%value = 3d0
control%n = 2
end if
!$OMP END PARALLEL
print *, '========================'
!$OMP PARALLEL private(id)
id = omp_get_thread_num()
print *, id, control%value, control%n
!$OMP END PARALLEL
end program main
subroutine input
use derivedtypes
implicit none
!$OMP PARALLEL
control%n = 1
if (allocated(control%value)) deallocate(control%value)
allocate(control%value(control%n))
control%value(1) = 5d0
!$OMP END PARALLEL
end subroutine input
module derivedtypes
implicit none
type :: controlderived
integer(kind=4) :: n
real(kind=8), allocatable, dimension(:) :: value
end type controlderived
type(controlderived), target :: control
!$OMP THREADPRIVATE(control)
end module derivedtypes
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Maybe your sketch code is not a correct representation of your application.
In your sketch, the UDT control is threadprivate. Therefore, the copyin(control) could be problematic.
It is ambiguous as to what copyin(control) does in this case. Does it make a new (stack local) copy of the threadprivate UDT variable control (of the master thread in this case) along with a copy of its allocated data data? IOW, in the context of that parallel region, is control implicitly stack local to each of the threads? Then at the end of the parallel region with copyin(control) are you expecting that local copy to be returned to the threadprivate control (of the master thread in this case) outside the parallel region?
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Copyin is just something I need in the real program; I don't actually need it in this simple program, but I still expect everything to work properly. Maybe I am misunderstanding what copyin actually does. Do you mean, copyin is problematic with thread private global variables?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Not seeing your code, it may have been presumptuous of me to comment on your sketch.
The copyin clause says in effect:
From the context of the running thread encountering the parallel region ...
... copy it's listed threadprivate variables to all other threads, threadprivate variables of the same name(s).
IOW initialize the listed items to that of the encountering thread (the master thread which may or may not be at a nest level).
This may make sense in the following sketch
program
call init(conrol) ! sequential process initializing the (threadprivate) control of the master thread
! broadcast master thread's contol to other threads on the following region
!$omp parallel copyin(control)
... ! each thread may modify its control
!$omp end parallel
...
!$omp parallel
... ! using the updated copy of its control from prior parallel region
!$omp end parallel
...
end program
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your reply. Do you mean, something like this should work:
program main
use derivedtypes
use omp_lib
implicit none
integer(kind=4) :: id
call omp_set_num_threads(4)
call input
! Copy threadprivate control derived type of the master thread to all other threads:
!$OMP PARALLEL copyin(control)
!$OMP END PARALLEL
!$OMP PARALLEL private(id)
id = omp_get_thread_num()
print *, id, control%value, control%temp, control%n
if (id == 1) then
control%value = 3.5d0
control%temp = 3.5d0
control%n = 2
end if
!$OMP END PARALLEL
print *, '========================'
!$OMP PARALLEL private(id)
id = omp_get_thread_num()
print *, id, control%value, control%temp, control%n
!$OMP END PARALLEL
end program main
subroutine input
use derivedtypes
implicit none
control%n = 1
if (allocated(control%value)) deallocate(control%value)
allocate(control%value(control%n))
control%value(1) = 5d0
control%temp = 5d0
end subroutine input
Unfortunately, I still have the same issue with control%value. control%temp and control%n are behaving correctly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Edited original post to avoid confusion regarding copyin.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using Intel Fortran 2025.0 (ifx), Debug build, and modifying your original control derived to include real(kind=8) :: temp
The output appears to be incorrect:
0 5.00000000000000 5.00000000000000 1
1 5.00000000000000 5.00000000000000 1
2 3.50000000000000 5.00000000000000 1
3 3.50000000000000 5.00000000000000 1
========================
0 3.50000000000000 5.00000000000000 1
1 3.50000000000000 3.50000000000000 2
2 3.50000000000000 5.00000000000000 1
3 3.50000000000000 5.00000000000000 1
Initial output above ==='s should have had all 5.0's in left column. This may occur if there was no implicit barrier between the first parallel region and the second parallel region.
Changing the 1st parallel region to:
!$OMP PARALLEL copyin(control)
print *, omp_get_thread_num(), control%value, control%temp, control%n
!$OMP END PARALLEL
3 5.00000000000000 5.00000000000000 1
0 5.00000000000000 5.00000000000000 1
2 5.00000000000000 5.00000000000000 1
1 5.00000000000000 5.00000000000000 1
0 5.00000000000000 5.00000000000000 1
3 5.00000000000000 5.00000000000000 1
2 5.00000000000000 5.00000000000000 1
1 5.00000000000000 5.00000000000000 1
========================
0 3.50000000000000 5.00000000000000 1
3 3.50000000000000 5.00000000000000 1
1 3.50000000000000 3.50000000000000 2
2 3.50000000000000 5.00000000000000 1
This makes the 1st and 2nd parallel region display the expected results (whether by accident or design).
Removing the above edit, and editing the 3rd parallel region to have:
print *, id, control%value, control%temp, control%n, loc(control%value)
This shows the problem:
0 5.00000000000000 5.00000000000000 1
1 5.00000000000000 5.00000000000000 1
2 3.50000000000000 5.00000000000000 1
3 3.50000000000000 5.00000000000000 1
========================
3 3.50000000000000 5.00000000000000 1
2236806605840
2 3.50000000000000 5.00000000000000 1
2236806605840
1 3.50000000000000 3.50000000000000 2
2236806605840
0 3.50000000000000 5.00000000000000 1
2236806605840
control%value is an allocatable, the copyin(control) should have performed a new allocation for each of the copies (and copied the original value from the master thread).
This is a bug
Using ifort shows the same problem.
The behavior is as if the array descriptor of the master thread's control%value is copied as-is.
IOW the data of each thread's thread private array value are not copies, the reference the same memory locations.
Editing the code
...
call input ! each thread allocating its control%value
!$OMP PARALLEL
print *, omp_get_thread_num(), allocated(control%value)
!$OMP END PARALLEL
...
We see correctly
0 T
2 F
1 F
3 F
0 5.00000000000000 5.00000000000000 1
3 5.00000000000000 5.00000000000000 1
1 5.00000000000000 5.00000000000000 1
2 3.50000000000000 5.00000000000000 1
========================
0 3.50000000000000 5.00000000000000 1 2385607530144
3 3.50000000000000 5.00000000000000 1 2385607530144
2 3.50000000000000 5.00000000000000 1 2385607530144
1 3.50000000000000 3.50000000000000 2 2385607530144
that the other threads control%value arrays have not been allocated, yet the copyin(value) is not performing as an intrinsic assignment:
myThreadprivate::control%value = masterThread::control%value
fwiw, the way it is working now is IMHO not correct, it is dangerous as inside a parallel region, any thread could issue
deallocate(control%value)
.or.
control%value = differentSizeArray
both cases would corrupt the heap.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page