<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic I humbly bow my head... I've in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038637#M45111</link>
    <description>&lt;P&gt;I humbly bow my head... I've mixed GB/s and Gbps in my tests. For my 5110P I get&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;tcp throughput: ~ 4 Gbps&lt;/LI&gt;
	&lt;LI&gt;offload throughput: 6.5 GiB/s&lt;/LI&gt;
	&lt;LI&gt;OpenCL bandwidth (pinned memory): 6.3 GiB/s&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;Sorry about the noise.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 17 Jun 2015 13:20:00 GMT</pubDate>
    <dc:creator>JJK</dc:creator>
    <dc:date>2015-06-17T13:20:00Z</dc:date>
    <item>
      <title>Ask recommendation for socket-like and efficient api to communicate with mic</title>
      <link>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038632#M45106</link>
      <description>&lt;P&gt;I am porting a server-client program to mic, which has high concurrency and massive data to transmit.&lt;/P&gt;

&lt;P&gt;The server side will be running on mic and supply computing service for client on host.&lt;/P&gt;

&lt;P&gt;There are more than 100 threads to transmit large than 10G data in total together. And it was using socket api to implement on clusters.&lt;/P&gt;

&lt;P&gt;So i was wondering if there is some socket-like and efficient api for me to adapt this program to mic easily and efficiently?&lt;/P&gt;

&lt;P&gt;Could you list some methods, and give some reference from which i can learn more?&lt;/P&gt;

&lt;P&gt;Thank a lot.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 14 Jun 2015 07:10:42 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038632#M45106</guid>
      <dc:creator>Xu_F_</dc:creator>
      <dc:date>2015-06-14T07:10:42Z</dc:date>
    </item>
    <item>
      <title>I'd start out with "just</title>
      <link>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038633#M45107</link>
      <description>&lt;P&gt;I'd start out with "just porting" the client/server model - TCP network performance between host and Phi is of the order of 4 Gbps; if you change everything to SCIF or an infiniband-like link you'd manage up to 6.4 Gbps. Is that worth the effort?&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jun 2015 08:52:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038633#M45107</guid>
      <dc:creator>JJK</dc:creator>
      <dc:date>2015-06-15T08:52:16Z</dc:date>
    </item>
    <item>
      <title>Quote:JJK wrote:</title>
      <link>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038634#M45108</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;JJK wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;I'd start out with "just porting" the client/server model - TCP network performance between host and Phi is of the order of 4 Gbps; if you change everything to SCIF or an infiniband-like link you'd manage up to 6.4 Gbps. Is that worth the effort?&lt;/P&gt;

&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;

&lt;P&gt;6.4 Gbps means 0.8GB/s?&lt;/P&gt;

&lt;P&gt;I think the speed is not enough in my application. In fact, I want the speed as fast as possible.&lt;/P&gt;

&lt;P&gt;I care about the efficiency most. Anyway to speedup the transmit is worth the effort.&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;So what is the proper transmit method for my app?&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 13.0080003738403px; line-height: 19.5120010375977px;"&gt;Thanks.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2015 08:06:12 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038634#M45108</guid>
      <dc:creator>Xu_F_</dc:creator>
      <dc:date>2015-06-17T08:06:12Z</dc:date>
    </item>
    <item>
      <title>Yes, that's 6.4 Gbps = 0.8 GB</title>
      <link>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038635#M45109</link>
      <description>&lt;P&gt;Yes, that's 6.4 Gbps = 0.8 GB/s&lt;/P&gt;

&lt;P&gt;The Xeon Phi is a PCI Express rev2 card, which gives you a theoretical maximum transfer rate of 8 Gbps = 1.0 GB/s ; in practice you'll never achieve more than ~ 6.4 Gbps - this also applies to all other PCI Express rev2 cards such as GPUs.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2015 08:52:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038635#M45109</guid>
      <dc:creator>JJK</dc:creator>
      <dc:date>2015-06-17T08:52:51Z</dc:date>
    </item>
    <item>
      <title>Don't forget the width of the</title>
      <link>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038636#M45110</link>
      <description>&lt;P&gt;Don't forget the width of the interface!&amp;nbsp; The Xeon Phi has a PCIe gen2 x16 interface.&amp;nbsp; PCIe gen2 has a raw bit rate of 5 Gbits/sec per lane per direction, with 4 Gbits/sec/lane/direction after 10/8 conversion.&amp;nbsp;&amp;nbsp; Multiplying by the width gives 4x16=64 Gbits/second/direction = 8 GB/second/direction.&lt;/P&gt;

&lt;P&gt;This 8 GB/s per direction has to include PCIe commands, responses, and headers in addition to data, so the effective data bandwidth is limited to somewhere in the range of 6.5 GB/s (unidirectional) or around 5.5 GB/s in each direction simultaneously.&amp;nbsp; The exact values depend on packet size, data address (which determines whether 32-bit or 64-bit addresses are used in the headers), transaction types, the use of various optional PCIe header packets, etc.&lt;/P&gt;

&lt;P&gt;Low-level (SCIF) benchmarks show user data transfer rates from host to Xeon Phi (or Xeon Phi to host) of well over 5 GB/s in one direction.&amp;nbsp; If I recall correctly, this is more than 10x faster than TCP/IP transfers.&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2015 13:11:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038636#M45110</guid>
      <dc:creator>McCalpinJohn</dc:creator>
      <dc:date>2015-06-17T13:11:32Z</dc:date>
    </item>
    <item>
      <title>I humbly bow my head... I've</title>
      <link>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038637#M45111</link>
      <description>&lt;P&gt;I humbly bow my head... I've mixed GB/s and Gbps in my tests. For my 5110P I get&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;tcp throughput: ~ 4 Gbps&lt;/LI&gt;
	&lt;LI&gt;offload throughput: 6.5 GiB/s&lt;/LI&gt;
	&lt;LI&gt;OpenCL bandwidth (pinned memory): 6.3 GiB/s&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;Sorry about the noise.&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2015 13:20:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038637#M45111</guid>
      <dc:creator>JJK</dc:creator>
      <dc:date>2015-06-17T13:20:00Z</dc:date>
    </item>
    <item>
      <title>You might want to look at:</title>
      <link>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038638#M45112</link>
      <description>&lt;P&gt;You might want to look at:&lt;/P&gt;

&lt;P&gt;&lt;A href="http://www.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-pci-transfer.html"&gt;http://www.intel.com/content/www/us/en/benchmarks/server/xeon-phi/xeon-phi-pci-transfer.html&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;The numbers there were obtained by running an optimized, internal test program. A couple of interesting points - the data transfer rates differ depending on the coprocessor version being used and going from the coprocessor to the host is a little bit faster than going from the host to the coprocessor. And speaking of going faster, you will notice that the highest speed is 6.98 GB/s (5120D coprocessor to host) and the slowest is 6.70 GB/s (5120D host to coprocessor), both a bit faster than the 6.5 GB/s - the number JJK found with his informal testing,&lt;/P&gt;

&lt;P&gt;I don't generally like to recommend that people use the SCIF api but if you really want the best transfer rate, that will be what you end up with. You can find a SCIF User Guide in the docs directory that comes with the MPSS.&lt;/P&gt;</description>
      <pubDate>Wed, 17 Jun 2015 23:47:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038638#M45112</guid>
      <dc:creator>Frances_R_Intel</dc:creator>
      <dc:date>2015-06-17T23:47:16Z</dc:date>
    </item>
    <item>
      <title>For the record: I ran the</title>
      <link>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038639#M45113</link>
      <description>&lt;P&gt;For the record: I ran the 'micprun' test on my 5110P card and got 6.97 GB/s as well.&lt;/P&gt;

&lt;P&gt;This is (again) a minor unit conversion thing: my simplistic benchmark results in 6.5 GiB/s which is 6.98 GB/s . A GiB is 1024**3 bytes wheras GB is 1 billion bytes. Thus, the upper limit on host-to-device and device-to-host bandwidth seems to be around 6.5 GiB/s == 6.98 GB/s. This is also what I'd expect from a 16 line PCI Express Rev2 card (5.0 GT/s max).&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 19 Jun 2015 10:14:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/Ask-recommendation-for-socket-like-and-efficient-api-to/m-p/1038639#M45113</guid>
      <dc:creator>JJK</dc:creator>
      <dc:date>2015-06-19T10:14:00Z</dc:date>
    </item>
  </channel>
</rss>

