<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi Ruibang (I think I know in Software Archive</title>
    <link>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954955#M20575</link>
    <description>&lt;P&gt;Hi Ruibang (I think I know you :-) ), are you willing to share your benchmark? any special configuration optimization? Thanks!!&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Ruibang L. wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I was able to achieve ~6.5G/s and ~12G/s one and bi-directionally respectively.&lt;/P&gt;
&lt;P&gt;#---------------------------------------------------&lt;BR /&gt;# Benchmarking PingPong &lt;BR /&gt;# #processes = 2 &lt;BR /&gt;#---------------------------------------------------&lt;BR /&gt; #bytes #repetitions t[usec] Mbytes/sec&lt;BR /&gt; 0 1000 20.54 0.00&lt;BR /&gt; 16777216 2 2606.99 6137.35&lt;BR /&gt; 33554432 1 5063.06 6320.29&lt;BR /&gt; 67108864 1 9898.54 6465.60&lt;/P&gt;
&lt;P&gt;#-----------------------------------------------------------------------------&lt;BR /&gt;# Benchmarking Exchange &lt;BR /&gt;# #processes = 2 &lt;BR /&gt;#-----------------------------------------------------------------------------&lt;BR /&gt; #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec&lt;BR /&gt; 0 1000 81.96 81.97 81.97 0.00&lt;BR /&gt; 16777216 2 5644.44 5648.49 5646.47 11330.45&lt;BR /&gt; 33554432 1 10926.96 10939.12 10933.04 11701.12&lt;BR /&gt; 67108864 1 21586.89 21597.86 21592.38 11853.02&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 09 May 2013 10:27:00 GMT</pubDate>
    <dc:creator>Mian_L_</dc:creator>
    <dc:date>2013-05-09T10:27:00Z</dc:date>
    <item>
      <title>host to mic bandwidth using MPI</title>
      <link>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954952#M20572</link>
      <description>&lt;P&gt;Hi, anyone has the result of using mpi to test the host&amp;lt;-&amp;gt; mic bandwidth? I tried on my machine, the bandwidth is quite low (~0.4GB/sec). I just send data from host to the mic card using blocking function and measure the time. The downloadspeed test in the shoc benchmark can generate up to 10GB/sec. Any idea about the low bandwidth using MPI? Thanks a lot!&lt;/P&gt;</description>
      <pubDate>Thu, 09 May 2013 08:57:49 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954952#M20572</guid>
      <dc:creator>Mian_L_</dc:creator>
      <dc:date>2013-05-09T08:57:49Z</dc:date>
    </item>
    <item>
      <title>btw, I download a third-part</title>
      <link>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954953#M20573</link>
      <description>&lt;P&gt;btw, I download a third-part benchmark&amp;nbsp;&lt;A href="http://mvapich.cse.ohio-state.edu/benchmarks/"&gt;http://mvapich.cse.ohio-state.edu/benchmarks/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;the result is similar to my program. i doubt there are some issues in my configuration, anyone has ideas?&lt;/P&gt;</description>
      <pubDate>Thu, 09 May 2013 09:51:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954953#M20573</guid>
      <dc:creator>Mian_L_</dc:creator>
      <dc:date>2013-05-09T09:51:06Z</dc:date>
    </item>
    <item>
      <title>I was able to achieve ~6.5G/s</title>
      <link>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954954#M20574</link>
      <description>&lt;P&gt;I was able to achieve ~6.5G/s and ~12G/s one and bi-directionally respectively.&lt;/P&gt;
&lt;P&gt;#---------------------------------------------------&lt;BR /&gt;# Benchmarking PingPong &lt;BR /&gt;# #processes = 2 &lt;BR /&gt;#---------------------------------------------------&lt;BR /&gt; #bytes #repetitions t[usec] Mbytes/sec&lt;BR /&gt; 0 1000 20.54 0.00&lt;BR /&gt; 16777216 2 2606.99 6137.35&lt;BR /&gt; 33554432 1 5063.06 6320.29&lt;BR /&gt; 67108864 1 9898.54 6465.60&lt;/P&gt;
&lt;P&gt;#-----------------------------------------------------------------------------&lt;BR /&gt;# Benchmarking Exchange &lt;BR /&gt;# #processes = 2 &lt;BR /&gt;#-----------------------------------------------------------------------------&lt;BR /&gt; #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec&lt;BR /&gt; 0 1000 81.96 81.97 81.97 0.00&lt;BR /&gt; 16777216 2 5644.44 5648.49 5646.47 11330.45&lt;BR /&gt; 33554432 1 10926.96 10939.12 10933.04 11701.12&lt;BR /&gt; 67108864 1 21586.89 21597.86 21592.38 11853.02&lt;/P&gt;</description>
      <pubDate>Thu, 09 May 2013 10:01:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954954#M20574</guid>
      <dc:creator>Ruibang_L_</dc:creator>
      <dc:date>2013-05-09T10:01:01Z</dc:date>
    </item>
    <item>
      <title>Hi Ruibang (I think I know</title>
      <link>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954955#M20575</link>
      <description>&lt;P&gt;Hi Ruibang (I think I know you :-) ), are you willing to share your benchmark? any special configuration optimization? Thanks!!&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Ruibang L. wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I was able to achieve ~6.5G/s and ~12G/s one and bi-directionally respectively.&lt;/P&gt;
&lt;P&gt;#---------------------------------------------------&lt;BR /&gt;# Benchmarking PingPong &lt;BR /&gt;# #processes = 2 &lt;BR /&gt;#---------------------------------------------------&lt;BR /&gt; #bytes #repetitions t[usec] Mbytes/sec&lt;BR /&gt; 0 1000 20.54 0.00&lt;BR /&gt; 16777216 2 2606.99 6137.35&lt;BR /&gt; 33554432 1 5063.06 6320.29&lt;BR /&gt; 67108864 1 9898.54 6465.60&lt;/P&gt;
&lt;P&gt;#-----------------------------------------------------------------------------&lt;BR /&gt;# Benchmarking Exchange &lt;BR /&gt;# #processes = 2 &lt;BR /&gt;#-----------------------------------------------------------------------------&lt;BR /&gt; #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec&lt;BR /&gt; 0 1000 81.96 81.97 81.97 0.00&lt;BR /&gt; 16777216 2 5644.44 5648.49 5646.47 11330.45&lt;BR /&gt; 33554432 1 10926.96 10939.12 10933.04 11701.12&lt;BR /&gt; 67108864 1 21586.89 21597.86 21592.38 11853.02&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 09 May 2013 10:27:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954955#M20575</guid>
      <dc:creator>Mian_L_</dc:creator>
      <dc:date>2013-05-09T10:27:00Z</dc:date>
    </item>
    <item>
      <title>i found the benchmark, thanks</title>
      <link>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954956#M20576</link>
      <description>&lt;P&gt;i found the benchmark, thanks&lt;/P&gt;</description>
      <pubDate>Thu, 09 May 2013 10:36:25 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954956#M20576</guid>
      <dc:creator>Mian_L_</dc:creator>
      <dc:date>2013-05-09T10:36:25Z</dc:date>
    </item>
    <item>
      <title>Hi Mian,</title>
      <link>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954957#M20577</link>
      <description>&lt;P&gt;Hi Mian,&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; Usually benchmarks underestimites the bandwidth sometimes about a quarter less than the actual hardware's bandwidth, so to know the really what happening in your MIC processor's runtime ,i think you should get known of the Vtune software tool, which helps you to monitor your processors'events when you run your applications and help you tune your application's performance,.&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; In your case as to the bandwidth measurement ,Vtune chould show to you the events it sampled in/out the memory bus during your application's running.&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; 1.Read bandwidth (bytes/clock)=(L2_DATA_READ_MISS_MEM_FILL + L2_DATA_MISS_MEM_FILL + HWP_L2MISS) * 64 / CPU_CLK_UNHALTED&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; 2.Write bandwidth= (bytes/clock)(L2_VICTIM_REQ_WITH_DATA + SNP_HITM_L2) * 64 / CPU_CLK_UNHALTED&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; 3.TotalBandwith (GB/Sec)=(Read bandwidth + Write bandwidth) * freq (in GHZ)&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; So you can easily figure out your wanted bandwidth number based on the events Number. I wish this link could give you more insight(http://software.intel.com/en-us/articles/optimization-and-performance-tuning-for-intel-xeon-phi-coprocessors-part-2-understanding).&lt;/P&gt;</description>
      <pubDate>Thu, 09 May 2013 10:48:55 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954957#M20577</guid>
      <dc:creator>QIAOMIN_Q_</dc:creator>
      <dc:date>2013-05-09T10:48:55Z</dc:date>
    </item>
    <item>
      <title>Thanks, QIAOMN. But I want to</title>
      <link>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954958#M20578</link>
      <description>Thanks, QIAOMN. But I want to measure the bandwidth between the host and mic cards (through PCIE), not the memory bandwidth. Here is my output using the intel mpi benchmark. Compared to Ruibang's result, the bandwidth is very low.... any one has suggestions? thanks very much!

#---------------------------------------------------
# Benchmarking PingPong 
# #processes = 2 
#---------------------------------------------------
       #bytes #repetitions      t[usec]   Mbytes/sec
            0         1000       129.99         0.00
            1         1000       139.75         0.01
            2         1000       131.26         0.01
            4         1000       126.89         0.03
            8         1000       134.76         0.06
           16         1000       129.93         0.12
           32         1000       131.42         0.23
           64         1000       133.16         0.46
          128         1000       131.38         0.93
          256         1000       131.24         1.86
          512         1000       132.47         3.69
         1024         1000       139.96         6.98
         2048         1000       169.95        11.49
         4096         1000       151.08        25.85
         8192         1000       167.11        46.75
        16384         1000       215.50        72.51
        32768         1000       309.60       100.94
        65536          640       464.70       134.50
       131072          320       654.15       191.09
       262144          160      1099.20       227.44
       524288           80      2159.97       231.48
      1048576           40      3675.76       272.05
      2097152           20      7585.00       263.68
      4194304           10     13317.50       300.36



#-----------------------------------------------------------------------------
# Benchmarking Exchange 
# #processes = 2 
#-----------------------------------------------------------------------------
       #bytes #repetitions  t_min[usec]  t_max[usec]  t_avg[usec]   Mbytes/sec
            0         1000       372.48       372.55       372.52         0.00
            1         1000       362.01       362.11       362.06         0.01
            2         1000       377.90       377.93       377.92         0.02
            4         1000       380.43       380.44       380.43         0.04
            8         1000       370.37       370.40       370.38         0.08
           16         1000       366.83       366.84       366.84         0.17
           32         1000       380.09       380.14       380.11         0.32
           64         1000       379.95       379.96       379.96         0.64
          128         1000       367.15       367.37       367.26         1.33
          256         1000       358.58       358.66       358.62         2.72
          512         1000       385.53       385.55       385.54         5.07
         1024         1000       393.08       393.11       393.10         9.94
         2048         1000       401.01       401.16       401.08        19.47
         4096         1000       385.83       385.88       385.85        40.49
         8192         1000       412.42       412.48       412.45        75.76
        16384         1000       466.70       466.78       466.74       133.90
        32768         1000       595.14       595.34       595.24       209.96
        65536          640      1217.73      1217.80      1217.76       205.29
       131072          320      1897.78      1898.52      1898.15       263.36
       262144          160      3511.84      3520.93      3516.38       284.02
       524288           80      7320.48      7332.59      7326.53       272.76
      1048576           40     12666.30     12708.85     12687.58       314.74
      2097152           20     23141.99     23311.20     23226.59       343.18
      4194304           10     48067.19     48803.71     48435.45       327.84</description>
      <pubDate>Fri, 10 May 2013 01:19:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954958#M20578</guid>
      <dc:creator>Mian_L_</dc:creator>
      <dc:date>2013-05-10T01:19:00Z</dc:date>
    </item>
    <item>
      <title>Hi Mian, sorry for the late</title>
      <link>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954959#M20579</link>
      <description>&lt;P&gt;Hi Mian, sorry for the late reply, yes we should know each other in Hong Kong via BGI.&lt;/P&gt;
&lt;P&gt;I guess you are using tcp as a frabic between the MIC card and the host thus 450MB/s at maximum is what you've got and also what I've got.&lt;/P&gt;
&lt;P&gt;Installing the ofed stacks in the mpss driver package will enable you to use direct memory access feature. I run the benchmark with "mpiexec.hydra -genv I_MPI_FABRICS=shm:dapl -n 1 -host bio-xinyi ~/tmp/imb/imb/3.2.4/src/IMB-MPI1 -off_cache 12,64 -npmin 64 -msglog 24:28 -time 10 -mem 1 PingPong Exchange : -n 1 -host mic0 /tmp/IMB-MPI1.mic" so it's fast.&lt;/P&gt;
&lt;P&gt;By default (I guess it's your case), it's using&amp;nbsp;&amp;nbsp;I_MPI_FABRICS=shm:tcp.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Mian L. wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi Ruibang (I think I know you :-) ), are you willing to share your benchmark? any special configuration optimization? Thanks!!&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Quote:&lt;/STRONG&gt;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;EM&gt;Ruibang L.&lt;/EM&gt;wrote:
&lt;P&gt;I was able to achieve ~6.5G/s and ~12G/s one and bi-directionally respectively.&lt;/P&gt;
&lt;P&gt;#---------------------------------------------------&lt;BR /&gt;# Benchmarking PingPong &lt;BR /&gt;# #processes = 2 &lt;BR /&gt;#---------------------------------------------------&lt;BR /&gt; #bytes #repetitions t[usec] Mbytes/sec&lt;BR /&gt; 0 1000 20.54 0.00&lt;BR /&gt; 16777216 2 2606.99 6137.35&lt;BR /&gt; 33554432 1 5063.06 6320.29&lt;BR /&gt; 67108864 1 9898.54 6465.60&lt;/P&gt;
&lt;P&gt;#-----------------------------------------------------------------------------&lt;BR /&gt;# Benchmarking Exchange &lt;BR /&gt;# #processes = 2 &lt;BR /&gt;#-----------------------------------------------------------------------------&lt;BR /&gt; #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec&lt;BR /&gt; 0 1000 81.96 81.97 81.97 0.00&lt;BR /&gt; 16777216 2 5644.44 5648.49 5646.47 11330.45&lt;BR /&gt; 33554432 1 10926.96 10939.12 10933.04 11701.12&lt;BR /&gt; 67108864 1 21586.89 21597.86 21592.38 11853.02&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 14 May 2013 03:07:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954959#M20579</guid>
      <dc:creator>Ruibang_L_</dc:creator>
      <dc:date>2013-05-14T03:07:01Z</dc:date>
    </item>
    <item>
      <title>Hi Ruibang, thanks very much!</title>
      <link>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954960#M20580</link>
      <description>&lt;P&gt;Hi Ruibang, thanks very much! Yes, I think you are right. The dapl model is not supported on our server, when I try to run your command, it outputs :&lt;/P&gt;
&lt;P&gt;MPI startup(): dapl fabric is not available and fallback fabric is not enabled&lt;/P&gt;
&lt;P&gt;Do you know how to install the ofed package? Is it supposed to be installed together with MPSS? If it can be installed separately, can you give me a link, please? Since I google it and cannot find correct information. Thanks very much.&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Ruibang L. wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi Mian, sorry for the late reply, yes we should know each other in Hong Kong via BGI.&lt;/P&gt;
&lt;P&gt;I guess you are using tcp as a frabic between the MIC card and the host thus 450MB/s at maximum is what you've got and also what I've got.&lt;/P&gt;
&lt;P&gt;Installing the ofed stacks in the mpss driver package will enable you to use direct memory access feature. I run the benchmark with "mpiexec.hydra -genv I_MPI_FABRICS=shm:dapl -n 1 -host bio-xinyi ~/tmp/imb/imb/3.2.4/src/IMB-MPI1 -off_cache 12,64 -npmin 64 -msglog 24:28 -time 10 -mem 1 PingPong Exchange : -n 1 -host mic0 /tmp/IMB-MPI1.mic" so it's fast.&lt;/P&gt;
&lt;P&gt;By default (I guess it's your case), it's using&amp;nbsp;&amp;nbsp;I_MPI_FABRICS=shm:tcp.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Quote:&lt;/STRONG&gt;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;EM&gt;Mian L.&lt;/EM&gt;wrote:
&lt;P&gt;Hi Ruibang (I think I know you :-) ), are you willing to share your benchmark? any special configuration optimization? Thanks!!&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Quote:&lt;/STRONG&gt;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;Ruibang L.&lt;/EM&gt;wrote:&lt;/P&gt;
&lt;P&gt;I was able to achieve ~6.5G/s and ~12G/s one and bi-directionally respectively.&lt;/P&gt;
&lt;P&gt;#---------------------------------------------------&lt;BR /&gt;# Benchmarking PingPong &lt;BR /&gt;# #processes = 2 &lt;BR /&gt;#---------------------------------------------------&lt;BR /&gt; #bytes #repetitions t[usec] Mbytes/sec&lt;BR /&gt; 0 1000 20.54 0.00&lt;BR /&gt; 16777216 2 2606.99 6137.35&lt;BR /&gt; 33554432 1 5063.06 6320.29&lt;BR /&gt; 67108864 1 9898.54 6465.60&lt;/P&gt;
&lt;P&gt;#-----------------------------------------------------------------------------&lt;BR /&gt;# Benchmarking Exchange &lt;BR /&gt;# #processes = 2 &lt;BR /&gt;#-----------------------------------------------------------------------------&lt;BR /&gt; #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec&lt;BR /&gt; 0 1000 81.96 81.97 81.97 0.00&lt;BR /&gt; 16777216 2 5644.44 5648.49 5646.47 11330.45&lt;BR /&gt; 33554432 1 10926.96 10939.12 10933.04 11701.12&lt;BR /&gt; 67108864 1 21586.89 21597.86 21592.38 11853.02&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 May 2013 01:23:05 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954960#M20580</guid>
      <dc:creator>Mian_L_</dc:creator>
      <dc:date>2013-05-15T01:23:05Z</dc:date>
    </item>
    <item>
      <title>The ofed rpms are distributed</title>
      <link>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954961#M20581</link>
      <description>&lt;P&gt;The ofed rpms are distributed with the MPSS driver. The installation guide is in&amp;nbsp;http://registrationcenter.intel.com/irc_nas/3156/readme-en.txt&lt;/P&gt;
&lt;P&gt;You'd better check if the kernel version required by the precompiled driver is the same with your server, or you have to recompile using rpmbuild --rebuild on src rpms from the driver sources. Please notice that "2.6.32-358.el6.x86_64" is totally different from "2.6.32-358.6.1.el6.x86_64".&lt;/P&gt;
&lt;P&gt;BTW, It seems that to utilize MPI on Xeon Phi one have to install the proprietary Intel® MPI package (the compiler is also the case). This is not good. I'm a poor researcher that can only afford the card, lol :&amp;gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;BLOCKQUOTE&gt;Mian L. wrote:&lt;BR /&gt;&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Hi Ruibang, thanks very much! Yes, I think you are right. The dapl model is not supported on our server, when I try to run your command, it outputs :&lt;/P&gt;
&lt;P&gt;MPI startup(): dapl fabric is not available and fallback fabric is not enabled&lt;/P&gt;
&lt;P&gt;Do you know how to install the ofed package? Is it supposed to be installed together with MPSS? If it can be installed separately, can you give me a link, please? Since I google it and cannot find correct information. Thanks very much.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Quote:&lt;/STRONG&gt;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;EM&gt;Ruibang L.&lt;/EM&gt;wrote:
&lt;P&gt;Hi Mian, sorry for the late reply, yes we should know each other in Hong Kong via BGI.&lt;/P&gt;
&lt;P&gt;I guess you are using tcp as a frabic between the MIC card and the host thus 450MB/s at maximum is what you've got and also what I've got.&lt;/P&gt;
&lt;P&gt;Installing the ofed stacks in the mpss driver package will enable you to use direct memory access feature. I run the benchmark with "mpiexec.hydra -genv I_MPI_FABRICS=shm:dapl -n 1 -host bio-xinyi ~/tmp/imb/imb/3.2.4/src/IMB-MPI1 -off_cache 12,64 -npmin 64 -msglog 24:28 -time 10 -mem 1 PingPong Exchange : -n 1 -host mic0 /tmp/IMB-MPI1.mic" so it's fast.&lt;/P&gt;
&lt;P&gt;By default (I guess it's your case), it's using&amp;nbsp;&amp;nbsp;I_MPI_FABRICS=shm:tcp.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Quote:&lt;/STRONG&gt;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;Mian L.&lt;/EM&gt;wrote:&lt;/P&gt;
&lt;P&gt;Hi Ruibang (I think I know you :-) ), are you willing to share your benchmark? any special configuration optimization? Thanks!!&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Quote:&lt;/STRONG&gt;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;P&gt;&lt;EM&gt;Ruibang L.&lt;/EM&gt;wrote:&lt;/P&gt;
&lt;P&gt;I was able to achieve ~6.5G/s and ~12G/s one and bi-directionally respectively.&lt;/P&gt;
&lt;P&gt;#---------------------------------------------------&lt;BR /&gt;# Benchmarking PingPong &lt;BR /&gt;# #processes = 2 &lt;BR /&gt;#---------------------------------------------------&lt;BR /&gt; #bytes #repetitions t[usec] Mbytes/sec&lt;BR /&gt; 0 1000 20.54 0.00&lt;BR /&gt; 16777216 2 2606.99 6137.35&lt;BR /&gt; 33554432 1 5063.06 6320.29&lt;BR /&gt; 67108864 1 9898.54 6465.60&lt;/P&gt;
&lt;P&gt;#-----------------------------------------------------------------------------&lt;BR /&gt;# Benchmarking Exchange &lt;BR /&gt;# #processes = 2 &lt;BR /&gt;#-----------------------------------------------------------------------------&lt;BR /&gt; #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec&lt;BR /&gt; 0 1000 81.96 81.97 81.97 0.00&lt;BR /&gt; 16777216 2 5644.44 5648.49 5646.47 11330.45&lt;BR /&gt; 33554432 1 10926.96 10939.12 10933.04 11701.12&lt;BR /&gt; 67108864 1 21586.89 21597.86 21592.38 11853.02&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 16 May 2013 06:02:23 GMT</pubDate>
      <guid>https://community.intel.com/t5/Software-Archive/host-to-mic-bandwidth-using-MPI/m-p/954961#M20581</guid>
      <dc:creator>Ruibang_L_</dc:creator>
      <dc:date>2013-05-16T06:02:23Z</dc:date>
    </item>
  </channel>
</rss>

