<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic MPI message rate scaling with number of peers in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-message-rate-scaling-with-number-of-peers/m-p/826457#M1235</link>
    <description>&lt;P&gt;Hi.&lt;BR /&gt;&lt;BR /&gt;I have some MPI code, where small messages (LEN = 1-128 bytes)&lt;BR /&gt;from one [host] node are sent to several peers. When I send messages&lt;BR /&gt;1-per-peer, like this:&lt;BR /&gt;&lt;BR /&gt;for (i = 0; i &amp;lt; ITER_NUM; ++i)&lt;BR /&gt;{&lt;BR /&gt;for (k = 1; k &amp;lt; NODES; ++k)&lt;BR /&gt;{&lt;BR /&gt;MPI_Isend(S_BUF, LEN, MPI_CHAR,&lt;BR /&gt;k, 0, MPI_COMM_WORLD, &amp;amp;reqs[nreqs++]);&lt;BR /&gt;}&lt;BR /&gt;if (nreqs / WINDOW &amp;gt; 0 || i == ITER_NUM - 1)&lt;BR /&gt;{&lt;BR /&gt;MPI_Waitall(nreqs, reqs, MPI_STATUSES_IGNORE);&lt;BR /&gt;nreqs = 0;&lt;BR /&gt;}&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;message rate falls down from 11.5 million messages/sec on 5-nodes config (1 host and 4 peers)&lt;BR /&gt;to 6.5 million/sec on 17-nodes setup (1 host and 16 peers). When I try to change cycle order like this:&lt;BR /&gt;&lt;BR /&gt;for (k = 1; k &amp;lt; NODES; ++k)&lt;BR /&gt;{&lt;BR /&gt;for (i = 0; i &amp;lt; ITER_NUM; ++i)&lt;BR /&gt;{&lt;BR /&gt;MPI_Isend(S_BUF, LEN, MPI_CHAR,&lt;BR /&gt;k, 0, MPI_COMM_WORLD, &amp;amp;reqs[nreqs++]);&lt;BR /&gt;}&lt;BR /&gt;if (nreqs / WINDOW &amp;gt; 0 || i == ITER_NUM - 1)&lt;BR /&gt;{&lt;BR /&gt;MPI_Waitall(nreqs, reqs, MPI_STATUSES_IGNORE);&lt;BR /&gt;nreqs = 0;&lt;BR /&gt;}&lt;BR /&gt;}&lt;BR /&gt;it works well (stable scaling, 11.5 million/sec).&lt;BR /&gt;ITER_NUMis about 100 000, and WINDOW there are MPI_Barrier() and time measurement.&lt;BR /&gt;Can someone help me, what are the reasons of message rate degrading?&lt;BR /&gt;Please, do not recommend message coalescing. And what should I try to&lt;BR /&gt;improve scaling? Eager protocol is used (in MPI), and rendezvous usage&lt;BR /&gt;did not help.&lt;BR /&gt;&lt;BR /&gt;Second question - I tried some other test. All nodes form node pairs((0, 1), (2, 3), ... (n - 2, n - 1)) and simple send-recv are used. Whennumber of node pairs grows large (256 and higher), both messagerate and bandwidth per-pair degrade significantly. At the same time,one can expect fat tree to scale nicely in this situation. Any ideas?&lt;BR /&gt;System config:&lt;BR /&gt;&lt;BR /&gt;2 x Intel Xeon X5570&lt;BR /&gt;InfiniBand QDR (fat tree)&lt;BR /&gt;Intel MPI 4.0.1&lt;BR /&gt;Intel C++ compiler 12.0&lt;/P&gt;</description>
    <pubDate>Thu, 10 May 2012 15:25:13 GMT</pubDate>
    <dc:creator>ingen23</dc:creator>
    <dc:date>2012-05-10T15:25:13Z</dc:date>
    <item>
      <title>MPI message rate scaling with number of peers</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-message-rate-scaling-with-number-of-peers/m-p/826457#M1235</link>
      <description>&lt;P&gt;Hi.&lt;BR /&gt;&lt;BR /&gt;I have some MPI code, where small messages (LEN = 1-128 bytes)&lt;BR /&gt;from one [host] node are sent to several peers. When I send messages&lt;BR /&gt;1-per-peer, like this:&lt;BR /&gt;&lt;BR /&gt;for (i = 0; i &amp;lt; ITER_NUM; ++i)&lt;BR /&gt;{&lt;BR /&gt;for (k = 1; k &amp;lt; NODES; ++k)&lt;BR /&gt;{&lt;BR /&gt;MPI_Isend(S_BUF, LEN, MPI_CHAR,&lt;BR /&gt;k, 0, MPI_COMM_WORLD, &amp;amp;reqs[nreqs++]);&lt;BR /&gt;}&lt;BR /&gt;if (nreqs / WINDOW &amp;gt; 0 || i == ITER_NUM - 1)&lt;BR /&gt;{&lt;BR /&gt;MPI_Waitall(nreqs, reqs, MPI_STATUSES_IGNORE);&lt;BR /&gt;nreqs = 0;&lt;BR /&gt;}&lt;BR /&gt;}&lt;BR /&gt;&lt;BR /&gt;message rate falls down from 11.5 million messages/sec on 5-nodes config (1 host and 4 peers)&lt;BR /&gt;to 6.5 million/sec on 17-nodes setup (1 host and 16 peers). When I try to change cycle order like this:&lt;BR /&gt;&lt;BR /&gt;for (k = 1; k &amp;lt; NODES; ++k)&lt;BR /&gt;{&lt;BR /&gt;for (i = 0; i &amp;lt; ITER_NUM; ++i)&lt;BR /&gt;{&lt;BR /&gt;MPI_Isend(S_BUF, LEN, MPI_CHAR,&lt;BR /&gt;k, 0, MPI_COMM_WORLD, &amp;amp;reqs[nreqs++]);&lt;BR /&gt;}&lt;BR /&gt;if (nreqs / WINDOW &amp;gt; 0 || i == ITER_NUM - 1)&lt;BR /&gt;{&lt;BR /&gt;MPI_Waitall(nreqs, reqs, MPI_STATUSES_IGNORE);&lt;BR /&gt;nreqs = 0;&lt;BR /&gt;}&lt;BR /&gt;}&lt;BR /&gt;it works well (stable scaling, 11.5 million/sec).&lt;BR /&gt;ITER_NUMis about 100 000, and WINDOW there are MPI_Barrier() and time measurement.&lt;BR /&gt;Can someone help me, what are the reasons of message rate degrading?&lt;BR /&gt;Please, do not recommend message coalescing. And what should I try to&lt;BR /&gt;improve scaling? Eager protocol is used (in MPI), and rendezvous usage&lt;BR /&gt;did not help.&lt;BR /&gt;&lt;BR /&gt;Second question - I tried some other test. All nodes form node pairs((0, 1), (2, 3), ... (n - 2, n - 1)) and simple send-recv are used. Whennumber of node pairs grows large (256 and higher), both messagerate and bandwidth per-pair degrade significantly. At the same time,one can expect fat tree to scale nicely in this situation. Any ideas?&lt;BR /&gt;System config:&lt;BR /&gt;&lt;BR /&gt;2 x Intel Xeon X5570&lt;BR /&gt;InfiniBand QDR (fat tree)&lt;BR /&gt;Intel MPI 4.0.1&lt;BR /&gt;Intel C++ compiler 12.0&lt;/P&gt;</description>
      <pubDate>Thu, 10 May 2012 15:25:13 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-message-rate-scaling-with-number-of-peers/m-p/826457#M1235</guid>
      <dc:creator>ingen23</dc:creator>
      <dc:date>2012-05-10T15:25:13Z</dc:date>
    </item>
    <item>
      <title>MPI message rate scaling with number of peers</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-message-rate-scaling-with-number-of-peers/m-p/826458#M1236</link>
      <description>Hi ingen,&lt;BR /&gt;&lt;BR /&gt;What type of receive are you using? Are you using the MPD process manager, or Hydra?&lt;BR /&gt;&lt;BR /&gt;Sincerely,&lt;BR /&gt;James Tullos&lt;BR /&gt;Technical Consulting Engineer&lt;BR /&gt;Intel Cluster Tools</description>
      <pubDate>Thu, 10 May 2012 20:23:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-message-rate-scaling-with-number-of-peers/m-p/826458#M1236</guid>
      <dc:creator>James_T_Intel</dc:creator>
      <dc:date>2012-05-10T20:23:09Z</dc:date>
    </item>
    <item>
      <title>MPI message rate scaling with number of peers</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-message-rate-scaling-with-number-of-peers/m-p/826459#M1237</link>
      <description>Hi, James.&lt;BR /&gt;&lt;BR /&gt;I am using MPI_Irecv() (but it worked the same way with MPI_Recv(), too).&lt;BR /&gt;Hydra (mpiexec.hydra) is used.</description>
      <pubDate>Fri, 11 May 2012 10:45:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-message-rate-scaling-with-number-of-peers/m-p/826459#M1237</guid>
      <dc:creator>ingen23</dc:creator>
      <dc:date>2012-05-11T10:45:43Z</dc:date>
    </item>
    <item>
      <title>MPI message rate scaling with number of peers</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-message-rate-scaling-with-number-of-peers/m-p/826460#M1238</link>
      <description>Hi ingen,&lt;BR /&gt;&lt;BR /&gt;I'm trying to get some additional information on why this behavior is occurring. I believe you are seeing two effects. The change at 17 ranks is likely due to running on multiple nodes, whereas 16 should run on a single node. This will require a change from shared memory to Infiniband.&lt;BR /&gt;&lt;BR /&gt;The second effect ispossibly due to the network layer. Opening and closing a network connection takes time, and these connections may not stay open between communications. By sending one message to a process at a time, you are frequently opening and closing connections. Sending all messages to one process allows the connection to remain open.&lt;BR /&gt;&lt;BR /&gt;I still need to look into the second issue with the node pairs.&lt;BR /&gt;&lt;BR /&gt;Sincerely,&lt;BR /&gt;James Tullos&lt;BR /&gt;Technical Consulting Engineer&lt;BR /&gt;Intel Cluster Tools</description>
      <pubDate>Fri, 11 May 2012 16:18:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-message-rate-scaling-with-number-of-peers/m-p/826460#M1238</guid>
      <dc:creator>IDZ_A_Intel</dc:creator>
      <dc:date>2012-05-11T16:18:04Z</dc:date>
    </item>
    <item>
      <title>MPI message rate scaling with number of peers</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-message-rate-scaling-with-number-of-peers/m-p/826461#M1239</link>
      <description>Thanks for your reply, James.&lt;BR /&gt;&lt;BR /&gt;Sorry, I was not explicit about MPI processes mapping - each process is on different node (I am sure). And there are total 16 processes, first process is communicating with 15 peers.&lt;BR /&gt;&lt;BR /&gt;In IntelMPI reference they say, that I_MPI_DYNAMIC_CONNECTION is set to "off" state by default when using less then 64 MPI procs. So, i think thats not the case here, but I believe there is something in this idea, about connection management. Currently I have no access to the cluster, when I'll try some - I will post the results here.&lt;BR /&gt;</description>
      <pubDate>Mon, 21 May 2012 10:24:16 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-message-rate-scaling-with-number-of-peers/m-p/826461#M1239</guid>
      <dc:creator>ingen23</dc:creator>
      <dc:date>2012-05-21T10:24:16Z</dc:date>
    </item>
  </channel>
</rss>

