Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29262 Discussions

Default behaviour ifort Vs ifx for threadprivate, allocatable variable and copyin clause of OpenMP

Gaurav-Saxena
Beginner
694 Views

The program below declares an integer array with the allocatable and save attributes. Subsequently, it is declared to be threadprivate, allocated a 5-element space in a parallel region and the array in the master thread is initialised. In the next parallel region a copyin clause is used to copy the master thread array into the corresponding arrays of the other threads. The location of each array in each thread is printed in both the parallel regions.    

 

 

 

PROGRAM THREADPRIV
      USE OMP_LIB
 	  IMPLICIT NONE
      INTEGER, ALLOCATABLE, SAVE :: arr(:)
      INTEGER :: i, TID 

      !$OMP THREADPRIVATE(arr)

      !$OMP PARALLEL
            TID = OMP_GET_THREAD_NUM()
            ALLOCATE(arr(5))
            PRINT *, "Thread", TID, "LOC=", LOC(arr)
      !$OMP END PARALLEL

      do i=1,5 
            arr(i) = i 
      end do

      ! Explicitly turn off dynamic threads
      CALL OMP_SET_DYNAMIC(.FALSE.)

!$OMP PARALLEL DEFAULT(NONE) PRIVATE(TID) COPYIN(arr)
      TID = OMP_GET_THREAD_NUM()
      PRINT *, "Thread", TID, "arr", arr, "LOC=", LOC(arr)
!$OMP END PARALLEL  
 
      END

 

 

 

First I set : export OMP_NUM_THREADS=2

(A) Compilation with ifort of oneapi/2023.2.0

$ ifort -qopenmp min_ifx_threadpriv_copyin.f90 -o mitc.exe

Execution and output:
$ ./mitc.exe  

 Thread           1 LOC=       140012973113312

 Thread           1 LOC=       140012973031392

 Thread           0 arr           1           2           3           4          5

  LOC=       140012973113312

 Thread           1 arr           1           2           3           4           5

   LOC=       140012973031392

This is correct (1) the arrays have different initial locations which matches the addresses from the first parallel region (2) The values are properly copied. 

(B) Compilation with ifx of oneapi/2023.2.0

$ ifx -qopenmp min_ifx_threadpriv_copyin.f90 -o mitc.exe

Execution and output:
$ ./mitc.exe

Thread           1 LOC=       140375478878176

 Thread           1 LOC=       140375478796256

 Thread           1 arr           1           2           3           4.          5

 LOC=       140375478878176

 Thread           0 arr           1           2           3           4           5

  LOC=       140375478878176

This is incorrect because the initial location of both the arrays from the second parallel region indicate that the copyin clause made the second thread's arr point to the master thread's arr (shallow copy). 

Question: Why is there a difference in the behaviour or am I making some mistake (conceptually or OpenMP standard-wise) ? I would be grateful for any advice/solution. 

0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
650 Views

This issue was discussed earlier.

It appears that the copyin(arr) is copying the contents of the issuing threads array descriptor of the allocatable array. IOW the allocation status, ranks and base address as opposed to copying the ranks, then performing an copyOfArray = copyInArray.

 

Intel, as I mentioned this in earlier thread, this has to be a bug. Consider what happens within the parallel region should some thread perform a deallocate(arr) .or. arr(idx) = value.

In the case of deallocate, the memory referenced in the private array descriptor is returned to heap, yet the other threads of the team array descriptor shows allocated. i.e. heap corruption.

In the case of arr(idx) = value the value at the index becomes visible to all threads of the team.

 

Jim Dempsey

 

View solution in original post

0 Kudos
2 Replies
jimdempseyatthecove
Honored Contributor III
651 Views

This issue was discussed earlier.

It appears that the copyin(arr) is copying the contents of the issuing threads array descriptor of the allocatable array. IOW the allocation status, ranks and base address as opposed to copying the ranks, then performing an copyOfArray = copyInArray.

 

Intel, as I mentioned this in earlier thread, this has to be a bug. Consider what happens within the parallel region should some thread perform a deallocate(arr) .or. arr(idx) = value.

In the case of deallocate, the memory referenced in the private array descriptor is returned to heap, yet the other threads of the team array descriptor shows allocated. i.e. heap corruption.

In the case of arr(idx) = value the value at the index becomes visible to all threads of the team.

 

Jim Dempsey

 

0 Kudos
Gaurav-Saxena
Beginner
523 Views

Dear Jim, 

My apologies for the late reply. Many thanks for the solution. I did try to search for the answer to this question on the Internet/User-Forums but none describe/solve the problem in a straightforward manner (and hence the question).  Thank you again for your efforts. We definitely know now that it is a bug.

Best regards,

Gaurav

0 Kudos
Reply