Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
17060 Discussions

MIC to MIC to HOST MPI bandwidth issue

marek_kaletka
Beginner
1,002 Views

Running Intel's IMPI benchmark (mpi ver 4.1.0.024) i've got some strange results.

mpirun -genv I_MPI_FABRICS=shm:dapl -np 2 -ppn 1 -hosts mic0,mic1 ./IMB-MPI1 PingPong 

   36 us lattency for 0 bytes messages , max 868 Mbytes/sec for 4MB messages.

using tcp instead of dapl (i have external bridge config for mic's ethernet ports with MTU of 1500):

 mpirun -genv I_MPI_FABRICS=shm:tcp -np 2 -ppn 1 -hosts mic0,mic1 ./IMB-MPI1 PingPong

  496 us lattency for 0 Bytes and 16 MBytes/sec max throughput for 4MB messages!!!

I've expected much better numbers (especially for tcp) - anyone with an idea what's wrong ?

0 Kudos
7 Replies
Dale_Wang
Beginner
1,002 Views

I met the same problem when I tested the bandwidth between MIC & host via Unix TCP Socket directly. It is 18MB/s.

0 Kudos
TimP
Honored Contributor III
1,002 Views

Current releases of Intel MPI should improve DAPL performance over such older ones; latest mvapich with the mic to mic communication over host QPI may be better yet.

0 Kudos
Vladimir_Dergachev
1,002 Views

I see the same problem which cripples NFS performance:

http://software.intel.com/en-us/forums/topic/404743#comment-1746053

About to write a custom library for file access over SCIF.. But if I had time the right way is to fix the network driver or write an ethernet over SCIF driver.

0 Kudos
Gregg_S_Intel
Employee
1,002 Views

Those latencies are too high and bandwidth too low.

I get DAPL latency close to 10us, bandwidth greater than 1300MB/s.

For TCP, latency is around 300us, bandwidth close to 80MB/s.

I'm using Intel(R) MPI 4.1.1.036.

See this article for cluster configuration tips:  http://software.intel.com/en-us/articles/configuring-intel-xeon-phi-coprocessors-inside-a-cluster

0 Kudos
marek_kaletka
Beginner
1,002 Views

switched to Intel(R) MPI 4.1.1.036 and latest MPSS - got slightly better results for dapl (16-20 usec and 885MB/s), but tcp still very slow.
Using dd to benchmark read/write speed from/to nfs share (filer known to be able to stream > 800MB/s) gives 20/21 MB/s.

IMO something's wrong with virtual nics and/or ip stack implementation in MPSS, or MIC's cores are simply not powerfull enough to handle more ip traffic.

IMB PingPong between two machines hosting MIC coprocessors using plane 1 Gbit Ethernet (i350) connection results in min latency of 50 usec and max bandwidth of 112MB/s (as exepcted, 1Gbit limit). That's roughly 10x quicker then between 2 MIC cards connected through PCIe and MPSS's virtual network stack. 

Gregg, what MTU size do you use in your environment ? I've double-checked my config, but can't go beyond 16MB/s using TCP.

0 Kudos
Gregg_S_Intel
Employee
1,002 Views

The article links to configuration notes directly from the administrator who set up the cluster whose latencies and bandwidth I quoted. It's good, first-hand information.  From the notes, "The MTU in this network is generally set to 9000.  Please adapt this to your settings."

0 Kudos
aazue
New Contributor I
1,002 Views

Hi
I read (I see the same problem which cripples NFS performance)
About the share  mounted
I don't know if it's possible for you with Phi to compile CIFS (Samba) sometimes he
will give better result that NFS
Advantage  the last version is able to read write in differed.
you have also more parameter options in his smb.conf for solve
when you discover the weird performances.
Problem of Samba it's little complex with  an number options gigantic..
Personally i use always  with fiber or with the copper also with wireless
and i am very satisfied of him.
I don't know but i  have  the doubts that MTU corrected will create the miracle
in your case precise.
Regards

0 Kudos
Reply