<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Quote:Michael Klemm (Intel) in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/OpenMP-4-0-Using-nowait-clause-for-asynchronous-offload/m-p/1071164#M58232</link>
    <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Michael Klemm (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;the pattern is almost correct. &amp;nbsp;If you want to synchronize the host execution with the async offload this is what you'd need to do:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;integer :: sync_var

! offloaded code section
!$omp target depend(out:sync_var) nowait
   call offloaded_stuff()
!$omp end target

! this part here executes concurrently with the target device
call stuff()

! now synchronize host and offload
!$omp task depend(in:sync_var) if(0)
!$omp end task&lt;/PRE&gt;

&lt;P&gt;The empty task is not really executed, it is just there to have a way to express the dependency of the offloaded region with the host execution. All code that follows the empty task will only execute when the async offload has finished.&lt;/P&gt;

&lt;P&gt;If there's only one thread, the OpenMP runtime does the magic to still have an async offload.&lt;/P&gt;

&lt;P&gt;Hope that helps!&lt;/P&gt;

&lt;P&gt;Cheers,&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; -michael&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Hi, Thanks that makes sense. I've tried a similar configuration but the problem persists. The offload just never seems to end. The last thing offload reports show is the target--&amp;gt; host copy.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 04 Jan 2016 20:46:08 GMT</pubDate>
    <dc:creator>Paulius_V_1</dc:creator>
    <dc:date>2016-01-04T20:46:08Z</dc:date>
    <item>
      <title>(OpenMP 4.0) Using nowait clause for asynchronous offload</title>
      <link>https://community.intel.com/t5/Software-Archive/OpenMP-4-0-Using-nowait-clause-for-asynchronous-offload/m-p/1071162#M58230</link>
      <description>&lt;P&gt;Hello. I am trying to test out the nowait clause but I'm having trouble with catching when the offload actually completes. I need to sync between the host and card before writing to global memory.&lt;/P&gt;

&lt;P&gt;without the nowait clause everything runs fine. With it, nothing seems to be happening - offload does not complete. &amp;nbsp;Any ideas? Thanks&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;PROGRAM ASYNC_TEST

USE OMP_LIB
USE IFPORT
IMPLICIT NONE
INTEGER :: X,Y,IE
REAL(16), ALLOCATABLE :: x_arr(:), y_arr(:)
REAL(16) t1a,t1b,t1c,t2a,t2b,t2c


t1a = omp_get_wtime()
allocate(x_arr(1000000))
allocate(y_arr(1000000))
t1b = omp_get_wtime()-t1a
WRITE(*,*) 'Allocation time on HOST: ',t1b

!on host
DO IE = 1,1000
x_arr(IE) = RAND()
y_arr(IE) = RAND() 
END DO



!        !$omp target nowait depend(out:Y) map(to:x_arr,y_arr)
        !$omp target nowait
        t1a = omp_get_wtime()
        DO X =1,100
        DO IE = 1,500000 
        x_arr(IE) = x_arr(IE)*x_arr(IE)+y_arr(IE)
        END DO
        END DO
        t1b = omp_get_wtime()-t1a
        WRITE(*,*) 'MIC COMPUTE: ',t1b
        !$omp end target

        t2a = omp_get_wtime()
        DO X = 1,100
        DO IE = 500001,1000000 
        x_arr(IE) = x_arr(IE)*x_arr(IE)+y_arr(IE)
        END DO
        END DO
        t2b = omp_get_wtime()-t2a
        WRITE(*,*) 'HOST_COMPUTE: ',t2b
        WRITE(*,*) 'MIC DONE'

!        !$omp task depend(in:t2b)
!        WRITE(*,*) 'HOST DONE'
!        !$omp end task


END PROGRAM
&lt;/PRE&gt;

&lt;P&gt;Also, do I have to encase the target directive in a task region? I will only have 1 thread on the host offloading. How do tasks work when there's only 1 tread?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Jan 2016 03:42:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/OpenMP-4-0-Using-nowait-clause-for-asynchronous-offload/m-p/1071162#M58230</guid>
      <dc:creator>Paulius_V_1</dc:creator>
      <dc:date>2016-01-04T03:42:51Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Software-Archive/OpenMP-4-0-Using-nowait-clause-for-asynchronous-offload/m-p/1071163#M58231</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;the pattern is almost correct. &amp;nbsp;If you want to synchronize the host execution with the async offload this is what you'd need to do:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;integer :: sync_var

! offloaded code section
!$omp target depend(out:sync_var) nowait
   call offloaded_stuff()
!$omp end target

! this part here executes concurrently with the target device
call stuff()

! now synchronize host and offload
!$omp task depend(in:sync_var) if(0)
!$omp end task&lt;/PRE&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;The empty task is not really executed, it is just there to have a way to express the dependency of the offloaded region with the host execution. All code that follows the empty task will only execute when the async offload has finished.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;If there's only one thread, the OpenMP runtime does the magic to still have an async offload.&lt;/P&gt;

&lt;P&gt;Hope that helps!&lt;/P&gt;

&lt;P&gt;Cheers,&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; -michael&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Jan 2016 09:46:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/OpenMP-4-0-Using-nowait-clause-for-asynchronous-offload/m-p/1071163#M58231</guid>
      <dc:creator>Michael_K_Intel2</dc:creator>
      <dc:date>2016-01-04T09:46:10Z</dc:date>
    </item>
    <item>
      <title>Quote:Michael Klemm (Intel)</title>
      <link>https://community.intel.com/t5/Software-Archive/OpenMP-4-0-Using-nowait-clause-for-asynchronous-offload/m-p/1071164#M58232</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Michael Klemm (Intel) wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;Hi,&lt;/P&gt;

&lt;P&gt;the pattern is almost correct. &amp;nbsp;If you want to synchronize the host execution with the async offload this is what you'd need to do:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;integer :: sync_var

! offloaded code section
!$omp target depend(out:sync_var) nowait
   call offloaded_stuff()
!$omp end target

! this part here executes concurrently with the target device
call stuff()

! now synchronize host and offload
!$omp task depend(in:sync_var) if(0)
!$omp end task&lt;/PRE&gt;

&lt;P&gt;The empty task is not really executed, it is just there to have a way to express the dependency of the offloaded region with the host execution. All code that follows the empty task will only execute when the async offload has finished.&lt;/P&gt;

&lt;P&gt;If there's only one thread, the OpenMP runtime does the magic to still have an async offload.&lt;/P&gt;

&lt;P&gt;Hope that helps!&lt;/P&gt;

&lt;P&gt;Cheers,&lt;/P&gt;

&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; -michael&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Hi, Thanks that makes sense. I've tried a similar configuration but the problem persists. The offload just never seems to end. The last thing offload reports show is the target--&amp;gt; host copy.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Jan 2016 20:46:08 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/OpenMP-4-0-Using-nowait-clause-for-asynchronous-offload/m-p/1071164#M58232</guid>
      <dc:creator>Paulius_V_1</dc:creator>
      <dc:date>2016-01-04T20:46:08Z</dc:date>
    </item>
    <item>
      <title>As you can see in the</title>
      <link>https://community.intel.com/t5/Software-Archive/OpenMP-4-0-Using-nowait-clause-for-asynchronous-offload/m-p/1071165#M58233</link>
      <description>&lt;P&gt;As you can see in the terminal, it never reaches done.&amp;nbsp;&lt;span class="lia-inline-image-display-wrapper" image-alt="offloadstuck_0.png"&gt;&lt;img src="https://community.intel.com/t5/image/serverpage/image-id/8379i83BF9AD45BB42CFE/image-size/large?v=v2&amp;amp;px=999&amp;amp;whitelist-exif-data=Orientation%2CResolution%2COriginalDefaultFinalSize%2CCopyright" role="button" title="offloadstuck_0.png" alt="offloadstuck_0.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 04 Jan 2016 21:42:20 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/OpenMP-4-0-Using-nowait-clause-for-asynchronous-offload/m-p/1071165#M58233</guid>
      <dc:creator>Paulius_V_1</dc:creator>
      <dc:date>2016-01-04T21:42:20Z</dc:date>
    </item>
    <item>
      <title>Added taskwait beofre MIC</title>
      <link>https://community.intel.com/t5/Software-Archive/OpenMP-4-0-Using-nowait-clause-for-asynchronous-offload/m-p/1071166#M58234</link>
      <description>&lt;P&gt;Added taskwait before MIC DONE as shown below.&lt;/P&gt;

&lt;P&gt;&lt;STRONG&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; !$omp taskwait&lt;/STRONG&gt;&lt;BR /&gt;
	&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; WRITE(*,*) 'MIC DONE'&lt;/P&gt;

&lt;P&gt;And this is the result I got&lt;/P&gt;

&lt;P&gt;ifort -qopenmp nowait.f90&lt;/P&gt;

&lt;P&gt;&amp;nbsp;./a.out&lt;BR /&gt;
	&amp;nbsp;Allocation time on HOST: &amp;nbsp; 9.059906005859375000000000000000000E-0006&lt;BR /&gt;
	&amp;nbsp;HOST_COMPUTE: &amp;nbsp; &amp;nbsp;1.38332700729370117187500000000000&lt;BR /&gt;
	&amp;nbsp;MIC COMPUTE: &amp;nbsp; &amp;nbsp;10.8623330593109130859375000000000&lt;BR /&gt;
	&amp;nbsp;MIC DONE&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 05 Jan 2016 00:19:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/OpenMP-4-0-Using-nowait-clause-for-asynchronous-offload/m-p/1071166#M58234</guid>
      <dc:creator>Ravi_N_Intel</dc:creator>
      <dc:date>2016-01-05T00:19:00Z</dc:date>
    </item>
  </channel>
</rss>

