Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

OpenMP Offload Reduction help

jimdempseyatthecove
Honored Contributor III
326 Views

Having issues with reduction in offload region

!  GPUtest2.f90 
module mod_GPUtest
real(8), allocatable :: CPUarray(:,:)
end module mod_GPUtest

program GPUtest2
    use omp_lib
    use mod_GPUtest
    implicit none
    real(8) :: T1, T2, sum
    integer :: i, j, nReps, nVecs
    
    nVecs = 1000000
    nReps = 100
    allocate(CPUarray(3,nVecs))

    call random_number(CPUarray)
    nReps = 1000
    
    T1 = omp_get_wtime()
    sum = 0.0_8
    do j=1,nReps
        do i=1,nVecs
            sum = sum + norm2(CPUarray(:,i))
        end do
    end do
    T2 = omp_get_wtime()
    print *,"CPUarray", sum, T2-T1
    
    T1 = omp_get_wtime()
    sum = 0.0_8
    !$omp parallel do private(i) reduction(+:sum) num_threads(8)
    do j=1,nReps
        do i=1,nVecs
            sum = sum + norm2(CPUarray(:,i))
        end do
    end do
    !$omp end parallel do
    T2 = omp_get_wtime()
    print *,"OpenMP CPUarray", sum, T2-T1
    
    T1 = omp_get_wtime()
    sum = 0.0_8
    !$omp target teams distribute parallel do map(toFrom:sum) map(from:nReps,nVecs,CPUarray) reduction(+:sum)
    do j=1,nReps
        do i=1,nVecs
            sum = sum + norm2(CPUarray(:,i))
        end do
    end do
    !$omp end target teams distribute parallel do
    T2 = omp_get_wtime()
    print *,"Target CPUarray", sum, T2-T1
    
end program GPUtest2
---- output ----
 CPUarray   960955364.535284        4.10575359989889
 OpenMP CPUarray   960955364.523428       0.650032500037923
 Target CPUarray  0.000000000000000E+000  0.114313300000504

Jim Dempsey

 

0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
224 Views

Resolved the issue.

My issue was in understanding the perspective of the map clause.

I needed to use map(to:....)

to copy from the CPU's perspective, from the CPU to the GPU.

As opposed to the, from the  GPU's perspective  from the CPU to the GPU

 

Jim Dempsey

View solution in original post

0 Kudos
1 Reply
jimdempseyatthecove
Honored Contributor III
225 Views

Resolved the issue.

My issue was in understanding the perspective of the map clause.

I needed to use map(to:....)

to copy from the CPU's perspective, from the CPU to the GPU.

As opposed to the, from the  GPU's perspective  from the CPU to the GPU

 

Jim Dempsey

0 Kudos
Reply