Software Archive
Read-only legacy content
17061 Discussions

Asyncronous offload takes more time than expected

aketh_t_
Beginner
436 Views

Hi all,

Here is a tricky issue i seem to be facing.

The use of the signal for asynchronous offload is defined in intel website as "both asynchronous data copy and compute".

That means the directive below must asynchronously copy and compute data.

    !dir$ offload begin target(mic:0)out(WORK4,WORK3,WORK,WORKF)signal(2)
    call my_state_advt(TRCR(:,:,:,1),TRCR(:,:,:,2),&
    RHOFULL=WORKF,RHOOUT_WORK4=WORK4,RHOOUT_WORK3=WORK3,RHOOUT_WORK=WORK)
    !dir$ end offload
 

However when I timed the offload with signal the timing was 0.1 seconds.

Hoping that this was a constant cost I increased the model resolution to exploit gainfully asynchronous strategy.

However interestingly the time taken again only increased to 0.23.

Am i doing my asynchronous data copying right OR is this the standard behaviour?

0 Kudos
1 Reply
Andrey_Vladimirov
New Contributor III
436 Views

Is this the first offload in your application? If so, it will have an overhead because you will be initializing the coprocessor-side driver during the first offload. To eliminate this effect, you have two options:

1) Make a dummy offload call at the beginning of the program to initialize the driver, and after that do the asynchronous offload. The time should be close to 0.

2) Set environment variable OFFLOAD_INIT=on_start and re-run the application. The first asyncronous offload call should take close to 0 seconds.

P.S.: I am assuming that you are timing outside the offload region, not inside.

 

0 Kudos
Reply