I have a peculiar application requirement for a hybrid application using C# + C++ + Fortran + MIC Offloads
This is an atypical program design whereby the offloads do not contain any !$OMP PARALLEL regions.
Instead, asynchronous offloading is performed by many threads on the host (one for each thread to run on the / in the/ MIC). This necessitated highly oversubscription of threads on the host.
The issue to overcome was to not induce unnecessary computation on the host while waiting for the offload to complete. Please note, that there are "sublime" issues in performing this.
The synchronous offload, where host thread waits for completion, places the host thread into a "spin wait" state.
The asynchronous wait, places the host thread into a "spin wait" state.
In both cases it is not documented how long the host thread spins before suspension. I suspect it may be KMP_BLOCKTIME, though this is not documented. And recall that this application is not using OpenMP.
The solution is to use OFFLOAD_SIGNAL in a stall loop yielding CPU time:
mySignal = LOC(mySignal) ! unique non-zero value of size of pointer !dir$ offload target (mic:iMIC) signal(mySignal) in(nlays1,nlds_rect1,nlds_circ1,nmp1,iOption1,iSlip1, & layerarr,loadarr1,loadarr2, npointsarr,pointsarr,iAccuracy) & out(resultsarr,principalarr1,sedarr) call crames_input_HostAndMIC(nlays1,nlds_rect1,nlds_circ1,nmp1,layerarr,loadarr1,loadarr2, & npointsarr,pointsarr,resultsarr,principalarr,iOption1,iAccuracy1,iSlip1,sedarr) !dir$ if defined(NO_MIC) ! no spin !dir$ else do call sleepqq(1) if(OFFLOAD_SIGNALED(iMIC, mySignal) /= 0) exit end do !dir$ endif !dir$ offload_wait target (mic:iMIC) wait(mySignal)
I haven't completed experimentation as to what the appropriate sleepqq time is for this application.
This technique does seem to yield the desired results.
I thought I should pass this tip along for others to consider.