- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
Here is a tricky issue i seem to be facing.
The use of the signal for asynchronous offload is defined in intel website as "both asynchronous data copy and compute".
That means the directive below must asynchronously copy and compute data.
!dir$ offload begin target(mic:0)out(WORK4,WORK3,WORK,WORKF)signal(2)
call my_state_advt(TRCR(:,:,:,1),TRCR(:,:,:,2),&
RHOFULL=WORKF,RHOOUT_WORK4=WORK4,RHOOUT_WORK3=WORK3,RHOOUT_WORK=WORK)
!dir$ end offload
However when I timed the offload with signal the timing was 0.1 seconds.
Hoping that this was a constant cost I increased the model resolution to exploit gainfully asynchronous strategy.
However interestingly the time taken again only increased to 0.23.
Am i doing my asynchronous data copying right OR is this the standard behaviour?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is this the first offload in your application? If so, it will have an overhead because you will be initializing the coprocessor-side driver during the first offload. To eliminate this effect, you have two options:
1) Make a dummy offload call at the beginning of the program to initialize the driver, and after that do the asynchronous offload. The time should be close to 0.
2) Set environment variable OFFLOAD_INIT=on_start and re-run the application. The first asyncronous offload call should take close to 0 seconds.
P.S.: I am assuming that you are timing outside the offload region, not inside.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page