<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Asynchronous data send taking time in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Asynchronous-data-send-taking-time/m-p/1065660#M56069</link>
    <description>&lt;P&gt;Hi all, I want to send data into the MIC asynchronously.&lt;/P&gt;

&lt;P&gt;I used the code below.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; start_time = omp_get_wtime()&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; !dir$ offload_transfer target(mic:0)in(TRACER)signal(1)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; end_time = omp_get_wtime()&lt;/P&gt;

&lt;P&gt;print *,"time taken is ",end_time - start_time&lt;/P&gt;

&lt;P&gt;TRACER here is a global variable marked target which has been imported from a different module.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;However the time taken here is unusually high&lt;/P&gt;

&lt;P&gt;approximately &amp;nbsp;time taken is&amp;nbsp;&amp;nbsp; 8.975481986999512E-002&lt;/P&gt;

&lt;P&gt;I do not see why does it take such a high value to asynchronously send data. It must just be a signal,which should take no more than a few milliseconds as I have noticed entire arrays being transferred in even shorter time.&lt;/P&gt;

&lt;P&gt;I am I doing something wrong here?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Fri, 13 Nov 2015 10:59:58 GMT</pubDate>
    <dc:creator>aketh_t_</dc:creator>
    <dc:date>2015-11-13T10:59:58Z</dc:date>
    <item>
      <title>Asynchronous data send taking time</title>
      <link>https://community.intel.com/t5/Software-Archive/Asynchronous-data-send-taking-time/m-p/1065660#M56069</link>
      <description>&lt;P&gt;Hi all, I want to send data into the MIC asynchronously.&lt;/P&gt;

&lt;P&gt;I used the code below.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; start_time = omp_get_wtime()&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; !dir$ offload_transfer target(mic:0)in(TRACER)signal(1)&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; end_time = omp_get_wtime()&lt;/P&gt;

&lt;P&gt;print *,"time taken is ",end_time - start_time&lt;/P&gt;

&lt;P&gt;TRACER here is a global variable marked target which has been imported from a different module.&amp;nbsp;&lt;/P&gt;

&lt;P&gt;However the time taken here is unusually high&lt;/P&gt;

&lt;P&gt;approximately &amp;nbsp;time taken is&amp;nbsp;&amp;nbsp; 8.975481986999512E-002&lt;/P&gt;

&lt;P&gt;I do not see why does it take such a high value to asynchronously send data. It must just be a signal,which should take no more than a few milliseconds as I have noticed entire arrays being transferred in even shorter time.&lt;/P&gt;

&lt;P&gt;I am I doing something wrong here?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Nov 2015 10:59:58 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Asynchronous-data-send-taking-time/m-p/1065660#M56069</guid>
      <dc:creator>aketh_t_</dc:creator>
      <dc:date>2015-11-13T10:59:58Z</dc:date>
    </item>
    <item>
      <title>1. I suspect that the time to</title>
      <link>https://community.intel.com/t5/Software-Archive/Asynchronous-data-send-taking-time/m-p/1065661#M56070</link>
      <description>&lt;P&gt;1. I suspect that the time to initialize the device is being included in this timing. Do an empty offload before this one to factor out the one-time initialization cost. For example:&lt;/P&gt;

&lt;P&gt;&amp;nbsp;!dir$ offload begin target(mic)&lt;/P&gt;

&lt;P&gt;!dir$ end offload&lt;/P&gt;

&lt;P&gt;That takes care of initialization time.&lt;/P&gt;

&lt;P&gt;2. Next, if variable TRACER is statically allocated in your program, then it will be sent through a dynamically allocated buffer. This buffer creation time will be included in your timing. You want to avoid that.&lt;/P&gt;

&lt;P&gt;3. Your program doesn't actually measure transfer time, just the time to *initiate* the transfer. Be aware of that.&lt;/P&gt;

&lt;P&gt;4. In Fortran it is best to allocate dynamic arrays beforehand on the device with alloc_if(.true.) free_if(.false.) and then do the transfer using alloc_if(.false.) free_if(.false.) reusing the device buffers previously created. This will give the best time. But again, be aware of the difference between "transfer time" and "transfer initiation time". Measuring transfer time when doing asynchronous offloads is not generally possible because you won't be able to capture a time value at the precise moment the transfer completes.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 13 Nov 2015 21:35:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Asynchronous-data-send-taking-time/m-p/1065661#M56070</guid>
      <dc:creator>Rajiv_D_Intel</dc:creator>
      <dc:date>2015-11-13T21:35:55Z</dc:date>
    </item>
    <item>
      <title>Hi I think the time reduced</title>
      <link>https://community.intel.com/t5/Software-Archive/Asynchronous-data-send-taking-time/m-p/1065662#M56071</link>
      <description>&lt;P&gt;Hi I think the time reduced after I tried what you have suggested.&lt;/P&gt;

&lt;P&gt;However its still as high as 2*10-2&lt;/P&gt;

&lt;P&gt;I need as low 5*10-3 atleast for asynchronous transfer, any help.&lt;/P&gt;

&lt;P&gt;Here is the code&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt; if(flag == 1)then
    allocate(TRCR(nx_block,ny_block,km,nt))
    !dir$ offload_transfer target(mic:0) nocopy( TRCR:alloc_if(.TRUE.) free_if(.FALSE.) )

    allocate(WORK(nx_block,ny_block,km),WORKF(nx_block,ny_block,km),WORK3(nx_block,ny_block,km),WORK4(nx_block,ny_block,km))
    !dir$ offload_transfer target(mic:0) nocopy( WORK:alloc_if(.TRUE.)free_if(.FALSE.))
    !dir$ offload_transfer target(mic:0) nocopy( WORKF:alloc_if(.TRUE.)free_if(.FALSE.))
    !dir$ offload_transfer target(mic:0) nocopy( WORK3:alloc_if(.TRUE.)free_if(.FALSE.))
    !dir$ offload_transfer target(mic:0) nocopy( WORK4:alloc_if(.TRUE.)free_if(.FALSE.))

    flag = 2
    endif

    !if(my_task == master_task)then 

    TRCR = TRACER (:,:,:,:,curtime,1)
    start_time = omp_get_wtime()
    !dir$ offload target(mic:0)in(TRCR:alloc_if(.FALSE.) free_if(.FALSE.)) out(WORKF,WORK3,WORK4,WORK:alloc_if(.FALSE.) free_if(.FALSE.))signal(1)
    call my_state_advt(TRCR(:,:,:,1),TRCR(:,:,:,2),&amp;amp;
    RHOFULL=WORKF,RHOOUT_WORK4=WORK4,RHOOUT_WORK3=WORK3,RHOOUT_WORK=WORK)
    !!dir$ end offload
    end_time = omp_get_wtime()

    !endif

    print *,end_time - start_time
&lt;/PRE&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Nov 2015 06:50:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Asynchronous-data-send-taking-time/m-p/1065662#M56071</guid>
      <dc:creator>aketh_t_</dc:creator>
      <dc:date>2015-11-16T06:50:16Z</dc:date>
    </item>
    <item>
      <title>the MIC OFFLOAD REPORT says</title>
      <link>https://community.intel.com/t5/Software-Archive/Asynchronous-data-send-taking-time/m-p/1065663#M56072</link>
      <description>&lt;P&gt;the MIC OFFLOAD REPORT says this&lt;/P&gt;

&lt;P&gt;[Offload] [MIC 0] [Tag 72] [State]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Start target entry: __offload_entry_baroclinic_F90_587baroclinic_mp_baroclinic_driver_ifort0101596643955Ee9p3L&lt;BR /&gt;
	[Offload] [MIC 0] [Tag 72] [Var]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; trcr&amp;nbsp; IN&lt;BR /&gt;
	[Offload] [MIC 0] [Tag 72] [Var]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; trcr&amp;nbsp; IN&lt;BR /&gt;
	[Offload] [MIC 0] [Tag 72] [Var]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; work&amp;nbsp; OUT&lt;BR /&gt;
	[Offload] [MIC 0] [Tag 72] [Var]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; work&amp;nbsp; OUT&lt;BR /&gt;
	[Offload] [MIC 0] [Tag 72] [Var]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; work4&amp;nbsp; OUT&lt;BR /&gt;
	[Offload] [MIC 0] [Tag 72] [Var]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; work4&amp;nbsp; OUT&lt;BR /&gt;
	[Offload] [MIC 0] [Tag 72] [Var]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; work3&amp;nbsp; OUT&lt;BR /&gt;
	[Offload] [MIC 0] [Tag 72] [Var]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; work3&amp;nbsp; OUT&lt;BR /&gt;
	[Offload] [MIC 0] [Tag 72] [Var]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; workf&amp;nbsp; OUT&lt;BR /&gt;
	[Offload] [MIC 0] [Tag 72] [Var]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; workf&amp;nbsp; OUT&lt;BR /&gt;
	[Offload] [MIC 0] [Tag 72] [State]&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Target-&amp;gt;host copyout data&amp;nbsp;&amp;nbsp; 0&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Nov 2015 07:00:29 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Asynchronous-data-send-taking-time/m-p/1065663#M56072</guid>
      <dc:creator>aketh_t_</dc:creator>
      <dc:date>2015-11-16T07:00:29Z</dc:date>
    </item>
    <item>
      <title>The offload will be done</title>
      <link>https://community.intel.com/t5/Software-Archive/Asynchronous-data-send-taking-time/m-p/1065664#M56073</link>
      <description>&lt;P&gt;The offload will be done using async transfer of IN data, chained to an async compute, chained to an async transfer of OUT data.&lt;/P&gt;

&lt;P&gt;However, the setup of the async data transfers involves programming the DMA channels, and that is done by the issuing thread. So the time taken to issue this offload will be proportional to the amount of data transferred&amp;nbsp;IN and OUT.&lt;/P&gt;

&lt;P&gt;Perhaps the amount of data is large?&lt;/P&gt;</description>
      <pubDate>Tue, 17 Nov 2015 00:30:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Asynchronous-data-send-taking-time/m-p/1065664#M56073</guid>
      <dc:creator>Rajiv_D_Intel</dc:creator>
      <dc:date>2015-11-17T00:30:41Z</dc:date>
    </item>
  </channel>
</rss>

