Software Archive
Read-only legacy content
17061 Discussions

(OpenMP 4.0) Using nowait clause for asynchronous offload

Paulius_V_1
Beginner
982 Views

Hello. I am trying to test out the nowait clause but I'm having trouble with catching when the offload actually completes. I need to sync between the host and card before writing to global memory.

without the nowait clause everything runs fine. With it, nothing seems to be happening - offload does not complete.  Any ideas? Thanks

PROGRAM ASYNC_TEST

USE OMP_LIB
USE IFPORT
IMPLICIT NONE
INTEGER :: X,Y,IE
REAL(16), ALLOCATABLE :: x_arr(:), y_arr(:)
REAL(16) t1a,t1b,t1c,t2a,t2b,t2c


t1a = omp_get_wtime()
allocate(x_arr(1000000))
allocate(y_arr(1000000))
t1b = omp_get_wtime()-t1a
WRITE(*,*) 'Allocation time on HOST: ',t1b

!on host
DO IE = 1,1000
x_arr(IE) = RAND()
y_arr(IE) = RAND() 
END DO



!        !$omp target nowait depend(out:Y) map(to:x_arr,y_arr)
        !$omp target nowait
        t1a = omp_get_wtime()
        DO X =1,100
        DO IE = 1,500000 
        x_arr(IE) = x_arr(IE)*x_arr(IE)+y_arr(IE)
        END DO
        END DO
        t1b = omp_get_wtime()-t1a
        WRITE(*,*) 'MIC COMPUTE: ',t1b
        !$omp end target

        t2a = omp_get_wtime()
        DO X = 1,100
        DO IE = 500001,1000000 
        x_arr(IE) = x_arr(IE)*x_arr(IE)+y_arr(IE)
        END DO
        END DO
        t2b = omp_get_wtime()-t2a
        WRITE(*,*) 'HOST_COMPUTE: ',t2b
        WRITE(*,*) 'MIC DONE'

!        !$omp task depend(in:t2b)
!        WRITE(*,*) 'HOST DONE'
!        !$omp end task


END PROGRAM

Also, do I have to encase the target directive in a task region? I will only have 1 thread on the host offloading. How do tasks work when there's only 1 tread?

 

0 Kudos
4 Replies
Michael_K_Intel2
Employee
982 Views

Hi,

the pattern is almost correct.  If you want to synchronize the host execution with the async offload this is what you'd need to do:

integer :: sync_var

! offloaded code section
!$omp target depend(out:sync_var) nowait
   call offloaded_stuff()
!$omp end target

! this part here executes concurrently with the target device
call stuff()

! now synchronize host and offload
!$omp task depend(in:sync_var) if(0)
!$omp end task

The empty task is not really executed, it is just there to have a way to express the dependency of the offloaded region with the host execution. All code that follows the empty task will only execute when the async offload has finished.

If there's only one thread, the OpenMP runtime does the magic to still have an async offload.

Hope that helps!

Cheers,

        -michael

 

 

 

0 Kudos
Paulius_V_1
Beginner
982 Views

Michael Klemm (Intel) wrote:

Hi,

the pattern is almost correct.  If you want to synchronize the host execution with the async offload this is what you'd need to do:

integer :: sync_var

! offloaded code section
!$omp target depend(out:sync_var) nowait
   call offloaded_stuff()
!$omp end target

! this part here executes concurrently with the target device
call stuff()

! now synchronize host and offload
!$omp task depend(in:sync_var) if(0)
!$omp end task

The empty task is not really executed, it is just there to have a way to express the dependency of the offloaded region with the host execution. All code that follows the empty task will only execute when the async offload has finished.

If there's only one thread, the OpenMP runtime does the magic to still have an async offload.

Hope that helps!

Cheers,

        -michael

 

 

 

 

Hi, Thanks that makes sense. I've tried a similar configuration but the problem persists. The offload just never seems to end. The last thing offload reports show is the target--> host copy. 

 

0 Kudos
Paulius_V_1
Beginner
982 Views

As you can see in the terminal, it never reaches done. offloadstuck_0.png

0 Kudos
Ravi_N_Intel
Employee
982 Views

Added taskwait before MIC DONE as shown below.

        !$omp taskwait
        WRITE(*,*) 'MIC DONE'

And this is the result I got

ifort -qopenmp nowait.f90

 ./a.out
 Allocation time on HOST:   9.059906005859375000000000000000000E-0006
 HOST_COMPUTE:    1.38332700729370117187500000000000
 MIC COMPUTE:    10.8623330593109130859375000000000
 MIC DONE

 

0 Kudos
Reply