Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

derived type with allocatable compoents in omp firstprivate clause ... still a bad idea?

may_ka
Beginner
526 Views

Hi,

i was chasing data race conditions in an omp environment and found that having a derived type with allocatable components the culprit. According to google there was once a problem but I wonder whether it is still virulent.

Here is an example code:

Module rrr
  Type :: aaa
    real(kind=8), allocatable :: b(:,:)
  end type aaa
contains
  Subroutine Subrrr(a)
    real(kind=8), intent(inout) :: a(:,:)
    write(*,*) loc(a)
  end Subroutine Subrrr
  Subroutine Subzzz(a)
    type(aaa), intent(inout) :: a(:)
    integer :: i
    do i=1,size(a)
      write(*,*) i,loc(a(i)%b)
    end do
  end Subroutine Subzzz
end Module rrr
Program test
  use rrr, only: subrrr, aaa, subzzz
  real(kind=8), allocatable :: x(:,:)
  integer :: i
  type(aaa), allocatable :: y(:)
  allocate(y(1))
  do i=1,size(y)
    allocate(y(i)%b(20,20),source=0.0D0)
    write(*,*) i,loc(y(i)%b)
  end do
  write(*,*) "@@@@@@@@@@@@@@@@@@@"
  !$omp parallel do firstprivate(y)
  Do i=1,10
    call subzzz(y)
  end Do
  !$omp end parallel do
  write(*,*) "@@@@@@@@@@@@@@@@@@@"
  allocate(x(20,20),source=0.0D0)
  write(*,*) loc(x)
  !$omp parallel do firstprivate(x)
  Do i=1,10
    call subrrr(x)
  end Do
  !$omp end parallel do
end Program test

When compiled with ifort 17.08 or 19.04 and the -qopenmp flag only the ouptut is

          1        23209329389632
 @@@@@@@@@@@@@@@@@@@
           1        23209329389632
           1        23209329389632
           1        23209329389632
           1        23209329389632
           1        23209329389632
           1        23209329389632
           1        23209329389632
           1        23209329389632
           1        23209329389632
           1        23209329389632
 @@@@@@@@@@@@@@@@@@@
        23209329385600
       140720494867472
        23194232672272
        23202940038928
        23207230320272
        23192018079760
        23200742231824
        23209327480336
        23205037198992
        23196430479248
        23198527639440

when compiled with "gfortran -fopenmp" the output is:

          1       94311007887248
 @@@@@@@@@@@@@@@@@@@
           1       94311007930896
           1       22394831899520
           1       22394764790656
           1       22394764791344
           1       94311007932272
           1       22394496355200
           1       22394630572928
           1       22394362137472
           1       22393422613376
           1       22394295028608
 @@@@@@@@@@@@@@@@@@@
       94311007932272
       22394831900208
       22394764791344
       22394362137472
       22394764791344
       94311007935488
       22394295028608
       22394496355200
       94311007935488
       22393422613376
       22394630572928

So it looks as if when compiled with ifort, the array y(i)%b is not copied, where as when using gfortran it is.

Any idea?

Cheers

0 Kudos
5 Replies
jimdempseyatthecove
Honored Contributor III
526 Views

>>So it looks as if when compiled with ifort, the array y(i)%b is not copied, where as when using gfortran it is

On line 14, add to the write , omp_get_thread_num()

IOW the parallel do loop might not have had chunk size of 1.

Jim Dempsey

0 Kudos
may_ka
Beginner
526 Views

Hi,

thanks. After adding omp_get_thread_num() the output is:

          1        23453246234752
 @@@@@@@@@@@@@@@@@@@
           2           1        23453246234752
           4           1        23453246234752
           6           1        23453246234752
           5           1        23453246234752
           7           1        23453246234752
           3           1        23453246234752
           1           1        23453246234752
           1           1        23453246234752
           0           1        23453246234752
           0           1        23453246234752
 @@@@@@@@@@@@@@@@@@@
        23453246230720
       140733095199120
        23446830436112
        23451147206288
       140733095199120
        23440169873296
        23449050046096
        23442267033488
        23453244366352
        23453244366352
        23444615843600

So it looks as if all 8 threads access the same array.

0 Kudos
may_ka
Beginner
526 Views

The workaround might be:

Module rrr
  !$ use omp_lib
  Type :: aaa
    real(kind=8), allocatable :: b(:,:)
  end type aaa
contains
  Subroutine Subzzz(a)
    type(aaa), intent(inout) :: a(:)
    integer :: i
    do i=1,size(a)
      write(*,*) omp_get_thread_num(),i,loc(a(i)%b)
    end do
  end Subroutine Subzzz
end Module rrr
Program test
  use rrr, only: aaa, subzzz
  integer :: i
  type(aaa), allocatable :: y(:)
  write(*,*) "@@@@@@@@@@@@@@@@@@@"
  !$omp parallel private(y)
  allocate(y(1))
  Do i=1,size(y)
    allocate(y(i)%b(20,20),source=0.0D0)
  end Do
  !$omp do
  Do i=1,10
    call subzzz(y)
  end Do
  !$omp end do
  !$omp end parallel
end Program test

with the ouput changing to

           0           1        23439116902528
           6           1        23439116513344
           5           1        23439116382272
           0           1        23439116902528
           4           1        23439116480576
           3           1        23439116447808
           1           1        23439116415040
           1           1        23439116415040
           2           1        23439116546112
           7           1        23439116316736

 

0 Kudos
jimdempseyatthecove
Honored Contributor III
526 Views

FWIW, there used to be an issue with earlier compilers where private(a) on an unallocated array a would not (necessarily) present an unallocated array a inside the parallel region. The work around was to use firstprivate(a) with the unallocated array a (thus copying the unallocated array descriptor). I do not recall which version fixed this issue.

Jim Dempsey

0 Kudos
may_ka
Beginner
526 Views

Thanks Jim.

cheers

0 Kudos
Reply