<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Porting code from MPI/Pro 1.7 to Intel MPI 3.1 in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/Porting-code-from-MPI-Pro-1-7-to-Intel-MPI-3-1/m-p/868287#M1775</link>
    <description>I am in the process of switching from MPI/Pro 1.7 to Intel MPI 3.1 and I am seeing very strange (and poor) performance issues that have stumped me.&lt;BR /&gt;&lt;BR /&gt;I am seeing poor performance throughout the entire code, but the front end is a good illustration of some of the problems I am seeing. The front end consists of two processes, process 0 (or I/O process) reads in a data header and data and passes them to process 1 (or compute process). Process 1 then processes the data and sends the output field(s) back to process 0 which saves them to disk.&lt;BR /&gt;&lt;BR /&gt;Here is the outline of the MPI framework for the two processes for the simple case of 1 I/O process and 1 compute process:&lt;BR /&gt;&lt;BR /&gt;Process 0:&lt;BR /&gt;&lt;BR /&gt;For (ifrm=0; ifrm &amp;lt;= totfrm; ifrm++;) {&lt;BR /&gt;&lt;BR /&gt; if (ifrm != totfrm) {&lt;BR /&gt; data_read (..., InpBuf, HD1,...);&lt;BR /&gt; MPI_Ssend (HD1,...);&lt;BR /&gt; MPI_Ssend (InpBuf,...);&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt; if (ifrm &amp;gt; 0) {&lt;BR /&gt; MPI_Recv (OutBuf,...);&lt;BR /&gt; sav_data (OutBuf,...);&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt;}  // for (ifrm=0...&lt;BR /&gt;&lt;BR /&gt;// No more data, send termination message&lt;BR /&gt;MPI_Send (MPI_BOTTOM, 0, ...);&lt;BR /&gt;&lt;BR /&gt;Process 1:&lt;BR /&gt;&lt;BR /&gt;// Initialize persistent communication requests&lt;BR /&gt;MPI_Recv_init (HdrBuf, ..., req_recvhdr);&lt;BR /&gt;MPI_Recv_init (InpBuf, ..., req_recvdat);&lt;BR /&gt;MPI_Ssend_init (OutBuf, ..., req_sendout);&lt;BR /&gt;&lt;BR /&gt;// Get header and data for first frame&lt;BR /&gt;MPI_Start (req_recvhdr);&lt;BR /&gt;MPI_Start (req_recvdat);&lt;BR /&gt;&lt;BR /&gt;while (1) {&lt;BR /&gt;&lt;BR /&gt; MPI_Wait (req_recvhdr, status);&lt;BR /&gt; MPI_Get_Count (status, count);&lt;BR /&gt; if (count = 0) {&lt;BR /&gt; execute termination code&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt; MPI_Wait (req_recvdat);&lt;BR /&gt;&lt;BR /&gt; // Start receive on next frame while processing current one&lt;BR /&gt; MPI_Start (req_recvhdr);&lt;BR /&gt; MPI_Start (req_recvdat);&lt;BR /&gt;&lt;BR /&gt; (...)&lt;BR /&gt; process data&lt;BR /&gt; (...)&lt;BR /&gt;&lt;BR /&gt; if (curr_frame &amp;gt; start_frame) {&lt;BR /&gt; MPI_Wait (req_sendout);&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt; (...)&lt;BR /&gt; process data&lt;BR /&gt; (...)&lt;BR /&gt;&lt;BR /&gt; // Send output field(s) back to I/O process&lt;BR /&gt; MPI_Start (req_sendout);&lt;BR /&gt;&lt;BR /&gt;}  // while (1)&lt;BR /&gt;&lt;BR /&gt;The problem I am having is that the MPI_Wait calls are chewing up a lot CPU cycles for no obvious reason and in a very erratic way. When using MPI/Pro, the above MPI framework works in a very reliable and predicable way. However, with Intel MPI, the code can spend almost no time (expected) or several minutes (very unexpected) on one of the MPI_Wait calls.  The two waits that are giving me the most problems are the ones associated with req_recvhdr and req_sendout.  &lt;BR /&gt;&lt;BR /&gt;The code is compiled using the 64-bit versions of the Intel compiler 10.1 and Intel MKL 10.0 and is run on RHEL4 nodes. Both processes are run on the same core. &lt;BR /&gt;&lt;BR /&gt;Like I have already said, this framework works well under MPI/Pro and I am stumped in terms of locating the problem(s) or what things I should try in order to fix the code. Any insight or guidance you could provide would be greatly appreciated.&lt;BR /&gt;</description>
    <pubDate>Mon, 29 Jun 2009 22:29:14 GMT</pubDate>
    <dc:creator>jburri</dc:creator>
    <dc:date>2009-06-29T22:29:14Z</dc:date>
    <item>
      <title>Porting code from MPI/Pro 1.7 to Intel MPI 3.1</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Porting-code-from-MPI-Pro-1-7-to-Intel-MPI-3-1/m-p/868287#M1775</link>
      <description>I am in the process of switching from MPI/Pro 1.7 to Intel MPI 3.1 and I am seeing very strange (and poor) performance issues that have stumped me.&lt;BR /&gt;&lt;BR /&gt;I am seeing poor performance throughout the entire code, but the front end is a good illustration of some of the problems I am seeing. The front end consists of two processes, process 0 (or I/O process) reads in a data header and data and passes them to process 1 (or compute process). Process 1 then processes the data and sends the output field(s) back to process 0 which saves them to disk.&lt;BR /&gt;&lt;BR /&gt;Here is the outline of the MPI framework for the two processes for the simple case of 1 I/O process and 1 compute process:&lt;BR /&gt;&lt;BR /&gt;Process 0:&lt;BR /&gt;&lt;BR /&gt;For (ifrm=0; ifrm &amp;lt;= totfrm; ifrm++;) {&lt;BR /&gt;&lt;BR /&gt; if (ifrm != totfrm) {&lt;BR /&gt; data_read (..., InpBuf, HD1,...);&lt;BR /&gt; MPI_Ssend (HD1,...);&lt;BR /&gt; MPI_Ssend (InpBuf,...);&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt; if (ifrm &amp;gt; 0) {&lt;BR /&gt; MPI_Recv (OutBuf,...);&lt;BR /&gt; sav_data (OutBuf,...);&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt;}  // for (ifrm=0...&lt;BR /&gt;&lt;BR /&gt;// No more data, send termination message&lt;BR /&gt;MPI_Send (MPI_BOTTOM, 0, ...);&lt;BR /&gt;&lt;BR /&gt;Process 1:&lt;BR /&gt;&lt;BR /&gt;// Initialize persistent communication requests&lt;BR /&gt;MPI_Recv_init (HdrBuf, ..., req_recvhdr);&lt;BR /&gt;MPI_Recv_init (InpBuf, ..., req_recvdat);&lt;BR /&gt;MPI_Ssend_init (OutBuf, ..., req_sendout);&lt;BR /&gt;&lt;BR /&gt;// Get header and data for first frame&lt;BR /&gt;MPI_Start (req_recvhdr);&lt;BR /&gt;MPI_Start (req_recvdat);&lt;BR /&gt;&lt;BR /&gt;while (1) {&lt;BR /&gt;&lt;BR /&gt; MPI_Wait (req_recvhdr, status);&lt;BR /&gt; MPI_Get_Count (status, count);&lt;BR /&gt; if (count = 0) {&lt;BR /&gt; execute termination code&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt; MPI_Wait (req_recvdat);&lt;BR /&gt;&lt;BR /&gt; // Start receive on next frame while processing current one&lt;BR /&gt; MPI_Start (req_recvhdr);&lt;BR /&gt; MPI_Start (req_recvdat);&lt;BR /&gt;&lt;BR /&gt; (...)&lt;BR /&gt; process data&lt;BR /&gt; (...)&lt;BR /&gt;&lt;BR /&gt; if (curr_frame &amp;gt; start_frame) {&lt;BR /&gt; MPI_Wait (req_sendout);&lt;BR /&gt; }&lt;BR /&gt;&lt;BR /&gt; (...)&lt;BR /&gt; process data&lt;BR /&gt; (...)&lt;BR /&gt;&lt;BR /&gt; // Send output field(s) back to I/O process&lt;BR /&gt; MPI_Start (req_sendout);&lt;BR /&gt;&lt;BR /&gt;}  // while (1)&lt;BR /&gt;&lt;BR /&gt;The problem I am having is that the MPI_Wait calls are chewing up a lot CPU cycles for no obvious reason and in a very erratic way. When using MPI/Pro, the above MPI framework works in a very reliable and predicable way. However, with Intel MPI, the code can spend almost no time (expected) or several minutes (very unexpected) on one of the MPI_Wait calls.  The two waits that are giving me the most problems are the ones associated with req_recvhdr and req_sendout.  &lt;BR /&gt;&lt;BR /&gt;The code is compiled using the 64-bit versions of the Intel compiler 10.1 and Intel MKL 10.0 and is run on RHEL4 nodes. Both processes are run on the same core. &lt;BR /&gt;&lt;BR /&gt;Like I have already said, this framework works well under MPI/Pro and I am stumped in terms of locating the problem(s) or what things I should try in order to fix the code. Any insight or guidance you could provide would be greatly appreciated.&lt;BR /&gt;</description>
      <pubDate>Mon, 29 Jun 2009 22:29:14 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Porting-code-from-MPI-Pro-1-7-to-Intel-MPI-3-1/m-p/868287#M1775</guid>
      <dc:creator>jburri</dc:creator>
      <dc:date>2009-06-29T22:29:14Z</dc:date>
    </item>
    <item>
      <title>Re: Porting code from MPI/Pro 1.7 to Intel MPI 3.1</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Porting-code-from-MPI-Pro-1-7-to-Intel-MPI-3-1/m-p/868288#M1776</link>
      <description>Hi jburri,&lt;BR /&gt;&lt;BR /&gt;Thanks for posting to the Intel HPC forums and welcome!&lt;BR /&gt;&lt;BR /&gt;You probably need to use wait-mode. Please try to set environment variable I_MPI_WAIT_MODE to 'on'.&lt;BR /&gt;Also you could try to set env variable I_MPI_RDMA_WRITE_IMM to 'enable'.&lt;BR /&gt;And you could play with I_MPI_SPIN_COUNT variable setting different values.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;Best wishes,&lt;BR /&gt; Dmitry&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN style="font-family: Arial; color: navy; font-size: x-small;"&gt;&lt;/SPAN&gt;</description>
      <pubDate>Tue, 30 Jun 2009 10:27:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Porting-code-from-MPI-Pro-1-7-to-Intel-MPI-3-1/m-p/868288#M1776</guid>
      <dc:creator>Dmitry_K_Intel2</dc:creator>
      <dc:date>2009-06-30T10:27:54Z</dc:date>
    </item>
    <item>
      <title>Re: Porting code from MPI/Pro 1.7 to Intel MPI 3.1</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Porting-code-from-MPI-Pro-1-7-to-Intel-MPI-3-1/m-p/868289#M1777</link>
      <description>Thanks Dmitry. I will play around with those paramter and see what the impact is on performance. &lt;BR /&gt;&lt;BR /&gt;-Jeremy&lt;BR /&gt;</description>
      <pubDate>Tue, 30 Jun 2009 15:47:17 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Porting-code-from-MPI-Pro-1-7-to-Intel-MPI-3-1/m-p/868289#M1777</guid>
      <dc:creator>jburri</dc:creator>
      <dc:date>2009-06-30T15:47:17Z</dc:date>
    </item>
  </channel>
</rss>

