<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic I am not sure what you mean in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018335#M37060</link>
    <description>&lt;P&gt;I am not sure what you mean when you say "Today I saw that the TARGET(MIC:0) or Target(MIC:1) always points to mic:0." Are you saying you have two coprocessor cards, both of which are up and running but you can only use the first coprocessor card?&lt;/P&gt;

&lt;P&gt;You say "I cannot really transfer to the core #1, or #2 etc on the first mic device. Only the first Mic which is (mic:0)." When you use any of the offload directives, you are offloading work to the coprocessor card, not to individual cores. Which cores get used depends on a number of things, including any affinity settings you used. Your program should use as many threads on as many cores as it can get useful work out of - which might or might not be all the cores and all the threads on each core you use.&lt;/P&gt;

&lt;P&gt;You say "I am getting the impression that the SIGnal and the STATUS may really only be one per mic card". You may use multiple signals in a single process offloading to a single coprocessor card, as long as the integer value of the tag you use is different in each case; the integer value of the tag is the key used to track signals. The name of the variable holding that tag is irrelevant. As for the status option, think of it as you would an IOSTAT parameter on a Fortran open, read, write or close statement. It applies to the individual offload directive. When the directive returns control to the host processor, the status variable has been set to whatever it is going to be set to. You can check the status value returned, then reuse the status variable in another offload directive.&lt;/P&gt;

&lt;P&gt;As far as timing, I would suggest you use OFFLOAD_REPORT to get more detailed information. You can find directions in Intel's Fortran reference manual. And, as I said before, it would be better to overlap the data transfer with offloaded work rather than overlap multiple data transfers, as I showed in the last bit of psuedocode.&lt;/P&gt;</description>
    <pubDate>Fri, 16 Oct 2015 21:10:16 GMT</pubDate>
    <dc:creator>Frances_R_Intel</dc:creator>
    <dc:date>2015-10-16T21:10:16Z</dc:date>
    <item>
      <title>Using offload_trasfer Status() and offload_status</title>
      <link>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018330#M37055</link>
      <description>&lt;P&gt;It amazes me when see new stuff which happened again today.&lt;BR /&gt;
	In the Fortran compiler it says under OFFLOAD:&lt;BR /&gt;
	&amp;nbsp;&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;use, intrinsic :: iso_c_binding
 
enum , bind (C)
 enumerator :: OFFLOAD_SUCCESS         = 0
 enumerator :: OFFLOAD_DISABLED        = 1  ! offload is disabled
 enumerator :: OFFLOAD_UNAVAILABLE     = 2  ! card is not available
 enumerator :: OFFLOAD_OUT_OF_MEMORY   = 3  ! not enough memory on device
 enumerator :: OFFLOAD_PROCESS_DIED    = 4  ! target process has died
 enumerator :: OFFLOAD_ERROR           = 5  ! unspecified error
end enum
 
type, bind (C) :: offload_status
 integer(kind=c_int) ::  result        = OFFLOAD_DISABLED   ! result, see enum above
 integer(kind=c_int) ::  device_number = -1  ! device number
 integer(kind=c_int) ::  data_sent     =  0  ! number of bytes sent to the target
 integer(kind=c_int) ::  data_received =  0  ! number of bytes received by host
end type offload_status&lt;/PRE&gt;

&lt;P&gt;So I poked into my code the following:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;MODULE

TYPE(offload_status), PUBLIC, DIMENSION(60) :: MICSTATUS
!... more stuff
LOGICAL(KIND=4), PARAMETER :: Yes = .TRUE.
LOGICAL(KIND=4), PARAMETER :: No = .FALSE.
LOGICAL(KIND=4), PARAMETER :: Amy = No   !Or No no no
!... more stuff
END MODULE&lt;/PRE&gt;

&lt;P&gt;In the main I have something like this:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;!...
!DIR$ ALIGN:64 DataIn
REAL(KIND=4), DIMENSION(:,:), ALLOCATABLE :: DataIn

!...
ALLOCATE(DataIn(1024,60))
!...

! establish the allocation on the mic
  !DIR$ OFFLOAD)TRANSFER TARGET(mic:0) IN(DataIn: ALLOW_IF(YES) FREE_IF(NO) ) STATUS(MICStatus(1))
!DIR$ OFFLOAD_WAIT TAREGT(mic:0) WAIT(MICStatus(1))
WRITE(*,100) '100', 1, MICStatus(1).RESULT, MICStatus(1).DEVICE, MICStatus(1).DATA_SENT, MICStatus(1).DATA_RECEIVED
100 FORMAT(A,' MS(',I3,').RES=',I2,  ' Dev=',I3 ' Tx=',I15, ' Rx=',I15)

!...
!A bigger loop
DO I = 1, 60
  !DIR$ OFFLOAD)TRANSFER TARGET(mic:0) IN(DataIn(:,I): ALLOW_IF(YES) FREE_IF(NO) ) STATUS(MICStatus(I))
!--- This stuff below was in a separate loop ...
!DIR$ OFFLOAD_WAIT TAREGT(mic:0) WAIT(MICStatus(I))
WRITE(*,100) '120',I,MICStatus(I).RESULT, MICStatus(I).DEVICE, MICStatus(I).DATA_SENT, MICStatus(I).DATA_RECEIVED
ENDDO
!End of a bigger loop
!...

! clean up the mic
  !DIR$ OFFLOAD)TRANSFER TARGET(mic:0) OUT(DataIn: ALLOW_IF(YES) FREE_IF(YES) ) STATUS(MICStatus(1))
!DIR$ OFFLOAD_WAIT TAREGT(mic:0) WAIT(MICStatus(1))
WRITE(*,100) '100', 1, MICStatus(1).RESULT, MICStatus(1).DEVICE, MICStatus(1).DATA_SENT, MICStatus(1).DATA_RECEIVED
100 FORMAT(A,' MS(',I3,').RES=',I2,  ' Dev=',I3 ' Tx=',I15, ' Rx=',I15)

DEALLOCATE(DataIn)&lt;/PRE&gt;

&lt;P&gt;What I see if that only MICStatus(1) is showing the results correctly.&lt;/P&gt;

&lt;P&gt;The sizeof(Status(1)) is 24 bytes, which I was expecting to be 16 (which is 4x C_INT).&lt;/P&gt;

&lt;P&gt;Then I tried doing the following:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;MODULE

TYPE(offload_status), PUBLIC, DIMENSION(60) :: pSTATUS
TYPE(offload_status), PUBLIC         :: MICSTATUS1
TYPE(offload_status), PUBLIC         :: MICSTATUS2
TYPE(offload_status), PUBLIC         :: MICSTATUS3
TYPE(offload_status), PUBLIC         :: MICSTATUS4
...

!... more stuff
END MODULE&lt;/PRE&gt;

&lt;P&gt;Followed by:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;ALLOCATE(pMICSTATUS(60))

pMICSTATUS(1) =&amp;gt; MICStatus1
pMICSTATUS(2) =&amp;gt; MICStatus2&lt;/PRE&gt;

&lt;P&gt;The last one failed as there are arguments for BIND(C) that require something... (??), and enumerator is a new one for me.&lt;/P&gt;

&lt;P&gt;I just wast to get the data moved to the mic and then start scheduling the work on the mic as the data is on it, so I need to know how to handle the status tags as indexed array structure/type or with a pointer.&lt;/P&gt;</description>
      <pubDate>Wed, 14 Oct 2015 10:17:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018330#M37055</guid>
      <dc:creator>holmz</dc:creator>
      <dc:date>2015-10-14T10:17:56Z</dc:date>
    </item>
    <item>
      <title>There appear to be several</title>
      <link>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018331#M37056</link>
      <description>&lt;P&gt;There appear to be several inconsistencies or typos in your code.&lt;/P&gt;

&lt;OL&gt;
	&lt;LI&gt;The offload directive takes a modifier ALLOC_IF, but your example uses&amp;nbsp;ALLOW_IF.&lt;/LI&gt;
	&lt;LI&gt;The WAIT clause takes a signal as its argument, while your code is using a STATUS variable.&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;I don't think&amp;nbsp;your code would compile and run as written. Can you provide the actual code?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 14 Oct 2015 17:49:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018331#M37056</guid>
      <dc:creator>Rajiv_D_Intel</dc:creator>
      <dc:date>2015-10-14T17:49:06Z</dc:date>
    </item>
    <item>
      <title>No I cannot include the</title>
      <link>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018332#M37057</link>
      <description>&lt;P&gt;No I cannot include the actual code as there is no internet connection at work, and at home I have ifort on a mac but there is no Xeon Phi available for a mac.&amp;nbsp;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;So I poke it in from memory or a piece of paper. (And my spelling is not too good) &amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;Basically the first transfer set up the allocate on the phi.&lt;/P&gt;

&lt;P&gt;The tranfer in the loop moves the data onto the the ascyronously while the phi should be doing work on existing data.&lt;/P&gt;

&lt;P&gt;The last transfer releases the phi memory.&lt;/P&gt;

&lt;P&gt;The problem is the STATUS(MICSTATUS(J)) in the main loop&lt;/P&gt;

&lt;P&gt;If J = 1 it works, or if I have separate MICSTATUS# for each J value. However it is not working with an array of MICStatus tags. It seems like it should be simple, but I am totally unfamiliar with ENUMERATOR and I do not often use BIND(C) .&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 14 Oct 2015 20:24:28 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018332#M37057</guid>
      <dc:creator>holmz</dc:creator>
      <dc:date>2015-10-14T20:24:28Z</dc:date>
    </item>
    <item>
      <title>When you say the OFFLOAD_WAIT</title>
      <link>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018333#M37058</link>
      <description>&lt;P&gt;When you say the OFFLOAD_WAIT are in a separate loop, I hope what you mean is:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;do big_loop=1,n&lt;/P&gt;

	&lt;P&gt;&amp;nbsp; &amp;nbsp;do i=1,60&lt;/P&gt;

	&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; start transfer&lt;/P&gt;

	&lt;P&gt;&amp;nbsp; &amp;nbsp;enddo&lt;/P&gt;

	&lt;P&gt;&amp;nbsp; &amp;nbsp;do i=1,60&lt;/P&gt;

	&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; wait transfer&lt;/P&gt;

	&lt;P&gt;&amp;nbsp; &amp;nbsp;enddo&lt;/P&gt;

	&lt;P&gt;enddo big_loop&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;and not&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;do i=1,60&lt;/P&gt;

	&lt;P&gt;&amp;nbsp; &amp;nbsp;start transfer&lt;/P&gt;

	&lt;P&gt;&amp;nbsp; &amp;nbsp;do j=1,n&lt;/P&gt;

	&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; wait transfer&lt;/P&gt;

	&lt;P&gt;&amp;nbsp; &amp;nbsp;enddo&lt;/P&gt;

	&lt;P&gt;enddo&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;You only get to wait once for each signal. If you meant the first, then what if you try:&lt;/P&gt;

&lt;PRE class="brush:fortran;"&gt;DO BIG_LOOP=1,n
   DO I = 1, 60
      !DIR$ OFFLOAD_TRANSFER TARGET(mic:0) IN(DataIn(:,I): ALLOW_IF(YES) FREE_IF(NO) ) STATUS(MICStatus) SIGNAL(I)
      !...check the status here – otherwise there is no point in putting the status clause on transfer
   ENDDO
   !... stuff happens
   DO I = 1, 60
      !DIR$ OFFLOAD_WAIT TARGET(mic:0) STATUS(MICStatus) WAIT(I)
      WRITE(*,100) '120',I,MICStatus.RESULT, MICStatus.DEVICE, MICStatus.DATA_SENT, MICStatus.DATA_RECEIVED
      !... more stuff happens
   ENDDO
ENDDO&lt;/PRE&gt;

&lt;P&gt;I am only using one MICStatus variable. You process the offload directive, check the status result and move on.&lt;/P&gt;

&lt;P&gt;I added a SIGNAL clause to the transfers - I'm not sure how you were getting asynchronous behavior without it - and set the tag for the SIGNAL and WAIT to the loop index. The important thing is that the integer value of the tag be the same for a matching signal/wait pair and different from every other signal/wait pair in use at the same time. This is one reason people often use the location of the data as the tag. But in this case, I am using 1 when transferring column 1; 2 when transferring column 2 and so on. The problem with using MICStatus for the signal or wait tag is that it is not an integer (although the first element is) and the value is not different for different signal/wait pairs.&lt;/P&gt;

&lt;P&gt;I don't think I would do the allocate or free asynchronously; but if I did I would probably use a 1 for the signal/wait tag for those since there is no overlap with other asynchronous offload operations.&lt;/P&gt;

&lt;P&gt;Finally, rather than start up all the transfers at once, I think I would try to keep just one ahead of where I wanted to be. In other words:&lt;/P&gt;

&lt;BLOCKQUOTE&gt;
	&lt;P&gt;start transfer of first column&lt;/P&gt;

	&lt;P&gt;do i = 1,60&lt;/P&gt;

	&lt;P&gt;&amp;nbsp; &amp;nbsp;if i not equal 60 start the transfer of the next column&lt;/P&gt;

	&lt;P&gt;&amp;nbsp; &amp;nbsp;wait for previous column to finish transfer&lt;/P&gt;

	&lt;P&gt;&amp;nbsp; &amp;nbsp;do some work&lt;/P&gt;

	&lt;P&gt;enddo&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;

&lt;P&gt;I haven't actually tried this out but that is what I would do if I did.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Oct 2015 00:39:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018333#M37058</guid>
      <dc:creator>Frances_R_Intel</dc:creator>
      <dc:date>2015-10-16T00:39:47Z</dc:date>
    </item>
    <item>
      <title>Yes Francis - you first part</title>
      <link>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018334#M37059</link>
      <description>&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Yes Francis - you first part was exactly what I mean.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;I see I forgot the ENDDO and subsequent DO.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;Today I saw that the TARGET(MIC:0) or Target(MIC:1) &amp;nbsp;always points to mic:0.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;​I cannot&amp;nbsp;&lt;/SPAN&gt;really transfer to the core #1, or #2 etc on the first mic device. Only the first Mic which is (mic:0).&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em; line-height: 1.5;"&gt;So with that I am getting the impression that the SIGnal and the STATUS may really only be one per mic card.&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;A colleague, who is a long ways away, claims to get threaded transfer rates of ~30GB/sec.&lt;/P&gt;

&lt;P&gt;I am getting&amp;nbsp;now (I am not sure), either 750 MB of transfer/sec, or 750 MB of 8 byte transfer/sec. I am 99% sure it is the former as I am summing the MICStatus.DATA_SENT and seeing and extra 80 bytes per transfer.&lt;BR /&gt;
	Basically an 8 byte variable array that is 1M long... +80 bytes for God and Intel know what FOR.&lt;/P&gt;

&lt;P&gt;So my first step is to know what I transfer onto the MIC, then then what I can transfer on/off (Duplex should be the same), and then know what the processing is taking.&lt;/P&gt;

&lt;P&gt;I appreciate your help Ms FR,&lt;BR /&gt;
	Cheers,&lt;BR /&gt;
	RH&lt;/P&gt;</description>
      <pubDate>Fri, 16 Oct 2015 12:42:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018334#M37059</guid>
      <dc:creator>holmz</dc:creator>
      <dc:date>2015-10-16T12:42:40Z</dc:date>
    </item>
    <item>
      <title>I am not sure what you mean</title>
      <link>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018335#M37060</link>
      <description>&lt;P&gt;I am not sure what you mean when you say "Today I saw that the TARGET(MIC:0) or Target(MIC:1) always points to mic:0." Are you saying you have two coprocessor cards, both of which are up and running but you can only use the first coprocessor card?&lt;/P&gt;

&lt;P&gt;You say "I cannot really transfer to the core #1, or #2 etc on the first mic device. Only the first Mic which is (mic:0)." When you use any of the offload directives, you are offloading work to the coprocessor card, not to individual cores. Which cores get used depends on a number of things, including any affinity settings you used. Your program should use as many threads on as many cores as it can get useful work out of - which might or might not be all the cores and all the threads on each core you use.&lt;/P&gt;

&lt;P&gt;You say "I am getting the impression that the SIGnal and the STATUS may really only be one per mic card". You may use multiple signals in a single process offloading to a single coprocessor card, as long as the integer value of the tag you use is different in each case; the integer value of the tag is the key used to track signals. The name of the variable holding that tag is irrelevant. As for the status option, think of it as you would an IOSTAT parameter on a Fortran open, read, write or close statement. It applies to the individual offload directive. When the directive returns control to the host processor, the status variable has been set to whatever it is going to be set to. You can check the status value returned, then reuse the status variable in another offload directive.&lt;/P&gt;

&lt;P&gt;As far as timing, I would suggest you use OFFLOAD_REPORT to get more detailed information. You can find directions in Intel's Fortran reference manual. And, as I said before, it would be better to overlap the data transfer with offloaded work rather than overlap multiple data transfers, as I showed in the last bit of psuedocode.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Oct 2015 21:10:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018335#M37060</guid>
      <dc:creator>Frances_R_Intel</dc:creator>
      <dc:date>2015-10-16T21:10:16Z</dc:date>
    </item>
    <item>
      <title>Hi Francis,</title>
      <link>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018336#M37061</link>
      <description>&lt;P&gt;Hi Francis,&lt;BR /&gt;
	&lt;BR /&gt;
	What I mean is that I can get just under 1GByte/sec of DMA to the mic... ~950 GB/second&lt;BR /&gt;
	I believe it all goes through mic:0 (or the 1st mic which is the pcie address of the 1st or zeroth mic). So how much transfer rate should one expect?&lt;BR /&gt;
	&lt;BR /&gt;
	So I am not sure if/how I get a higher transfer rate? I am &amp;lt;currently&amp;gt; transferring into a buffer in in 1M sample chunks with a complex(kind=4) size. I will be doing generally real(kind=4) or complex(kind=4), so 4 or 8 bytes.&lt;BR /&gt;
	&lt;BR /&gt;
	The answer to the problem I solved in my mind last week after reading your post about "different in each case"... And I thought "did I initialise it?'&lt;BR /&gt;
	INTEGER(KIND=4), DIMENSION(60) :: Sig !?????&lt;/P&gt;

&lt;P&gt;So I added INTEGER(KIND-4), DIMENSION(60) = SIG = (1,2,3,4,... 60) !Yes !!!&lt;/P&gt;

&lt;P&gt;Which solved the main issue ;( ... &amp;lt;hanging head in shame&amp;gt;&lt;/P&gt;

&lt;P&gt;I am pondering your pseudo code....&lt;/P&gt;

&lt;P&gt;The last question is that I am transferring 1M of COMPLEX(KIND=4) and I get 8M + 80 bytes. What are those 80 bytes?&lt;/P&gt;

&lt;P&gt;Thanks,&lt;BR /&gt;
	Randal&lt;/P&gt;</description>
      <pubDate>Mon, 19 Oct 2015 13:09:48 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Using-offload-trasfer-Status-and-offload-status/m-p/1018336#M37061</guid>
      <dc:creator>holmz</dc:creator>
      <dc:date>2015-10-19T13:09:48Z</dc:date>
    </item>
  </channel>
</rss>

