<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hi. in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957894#M3113</link>
    <description>&lt;P&gt;Hi.&lt;/P&gt;
&lt;P&gt;I ran with 64 processes and got the output, which is attacted.&lt;/P&gt;
&lt;P&gt;The error occurs when the program runs in loop 1 and outputs the 2nd netCDF file.　&lt;/P&gt;
&lt;P&gt;Thank you for your help.&lt;/P&gt;</description>
    <pubDate>Wed, 06 Mar 2013 05:50:43 GMT</pubDate>
    <dc:creator>Wencan_W_</dc:creator>
    <dc:date>2013-03-06T05:50:43Z</dc:date>
    <item>
      <title>MPI-IO error when running on lustre with a high number of stripes and processes</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957889#M3108</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I'm trying to run pNetCDF on lustre. The test code and pNetCDF library are both compiled with intel mpi library v4.0.2. Our lustre file system has 40 OSTs.&lt;/P&gt;
&lt;P&gt;When running with stripes = 1 or processes = 32, the test codes works well and can output data correctly.&lt;/P&gt;
&lt;P&gt;However, when I set stripe = 40 and run with 64 processes, the test code crashed as :&lt;/P&gt;
&lt;P&gt;&amp;nbsp; rank 19 in job 1 c25b09_39645 caused collective abort of all ranks&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; exit status of rank 19: killed by signal 9&lt;/P&gt;
&lt;P&gt;The test code is attacted. Thank you in advance.&lt;/P&gt;</description>
      <pubDate>Wed, 27 Feb 2013 14:59:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957889#M3108</guid>
      <dc:creator>Wencan_W_</dc:creator>
      <dc:date>2013-02-27T14:59:09Z</dc:date>
    </item>
    <item>
      <title>I use GDB to debug and got</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957890#M3109</link>
      <description>&lt;P&gt;I use GDB to debug and got the error message :&lt;/P&gt;
&lt;P&gt;Program received signal SIGFPE, Arithmetic exception.&lt;BR /&gt;34: 0x00002aaac33327e0 in ADIOI_LUSTRE_Get_striping_info ()&lt;BR /&gt;34: from /apps/intel/impi/4.0.2.003/intel64/lib/libmpi_lustre.so&lt;/P&gt;
&lt;P&gt;It seems to be the same with&amp;nbsp;&lt;A href="http://lists.mcs.anl.gov/pipermail/mpich-discuss/2010-September/007947.html"&gt;http://lists.mcs.anl.gov/pipermail/mpich-discuss/2010-September/007947.html&lt;/A&gt;.&lt;/P&gt;
&lt;P&gt;Is it a bug in Intel mpi library v4.0.2, And has it been fixed in new version?&lt;/P&gt;</description>
      <pubDate>Thu, 28 Feb 2013 03:22:56 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957890#M3109</guid>
      <dc:creator>Wencan_W_</dc:creator>
      <dc:date>2013-02-28T03:22:56Z</dc:date>
    </item>
    <item>
      <title>Hi Wencan,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957891#M3110</link>
      <description>&lt;P&gt;Hi Wencan,&lt;/P&gt;
&lt;P&gt;I can't find any indication of this being a known issue.&amp;nbsp; There is an issue related to Lustre in the latest version that might cause a problem for you (undefined symbol in one of our libraries).&amp;nbsp; I would recommend trying version 4.0.3 first, and 4.1.0.030 if 4.0.3 does not work.&amp;nbsp; Please let me know if you try any and what the results are.&lt;/P&gt;
&lt;P&gt;Can you please attach your test code?&amp;nbsp;&amp;nbsp;It did not get properly attached to the first post.&lt;/P&gt;
&lt;P&gt;Sincerely,&lt;BR /&gt; James Tullos&lt;BR /&gt; Technical Consulting Engineer&lt;BR /&gt; Intel® Cluster Tools&lt;/P&gt;</description>
      <pubDate>Thu, 28 Feb 2013 20:30:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957891#M3110</guid>
      <dc:creator>James_T_Intel</dc:creator>
      <dc:date>2013-02-28T20:30:10Z</dc:date>
    </item>
    <item>
      <title>Thanke you for your help.</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957892#M3111</link>
      <description>&lt;P&gt;Thanke you for your help.&lt;/P&gt;</description>
      <pubDate>Fri, 01 Mar 2013 02:24:04 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957892#M3111</guid>
      <dc:creator>Wencan_W_</dc:creator>
      <dc:date>2013-03-01T02:24:04Z</dc:date>
    </item>
    <item>
      <title>Hi Wencan,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957893#M3112</link>
      <description>&lt;P&gt;Hi Wencan,&lt;/P&gt;
&lt;P&gt;I am only able to set striping up to 18 on the cluster I am using.&amp;nbsp; At 18 stripes, I am unable to reproduce this behavior.&amp;nbsp; Please run with I_MPI_DEBUG=5 and send the output.&lt;/P&gt;
&lt;P&gt;Sincerely,&lt;BR /&gt; James Tullos&lt;BR /&gt; Technical Consulting Engineer&lt;BR /&gt; Intel® Cluster Tools&lt;/P&gt;</description>
      <pubDate>Tue, 05 Mar 2013 19:38:44 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957893#M3112</guid>
      <dc:creator>James_T_Intel</dc:creator>
      <dc:date>2013-03-05T19:38:44Z</dc:date>
    </item>
    <item>
      <title>Hi.</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957894#M3113</link>
      <description>&lt;P&gt;Hi.&lt;/P&gt;
&lt;P&gt;I ran with 64 processes and got the output, which is attacted.&lt;/P&gt;
&lt;P&gt;The error occurs when the program runs in loop 1 and outputs the 2nd netCDF file.　&lt;/P&gt;
&lt;P&gt;Thank you for your help.&lt;/P&gt;</description>
      <pubDate>Wed, 06 Mar 2013 05:50:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957894#M3113</guid>
      <dc:creator>Wencan_W_</dc:creator>
      <dc:date>2013-03-06T05:50:43Z</dc:date>
    </item>
    <item>
      <title>Hi Wencan,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957895#M3114</link>
      <description>&lt;P&gt;Hi Wencan,&lt;/P&gt;
&lt;P&gt;It appears you are using LSF* as your job scheduler.&amp;nbsp; We do have&amp;nbsp;some known&amp;nbsp;issues with LSF*.&amp;nbsp; I don't think they're related to this case, but can you try a few things just in case?&amp;nbsp; First, try running in an interactive job.&amp;nbsp; Due to one of the known issues, you will probably need to add&lt;/P&gt;
&lt;P&gt;[plain]-genv LD_LIBRARY_PATH $LD_LIBRARY_PATH[/plain]&lt;/P&gt;
&lt;P&gt;to your mpirun command.&amp;nbsp; Also, please try running completely outside of LSF*.&lt;/P&gt;
&lt;P&gt;Could you also send the output from stderr (for a failing job)?&amp;nbsp; It would be best if you have stdout and stderr in the same file.&lt;/P&gt;
&lt;P&gt;Sincerely,&lt;BR /&gt; James Tullos&lt;BR /&gt; Technical Consulting Engineer&lt;BR /&gt; Intel® Cluster Tools&lt;/P&gt;</description>
      <pubDate>Thu, 14 Mar 2013 15:41:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957895#M3114</guid>
      <dc:creator>James_T_Intel</dc:creator>
      <dc:date>2013-03-14T15:41:41Z</dc:date>
    </item>
    <item>
      <title>Hi,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957896#M3115</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I ran with mpiexec comman as:&lt;/P&gt;
&lt;P&gt;[bash]mpiexec -genv I_MPI_EXTRA_FILESYSTEM on -genv I_MPI_EXTRA_FILESYSTEM_LIST lustre -genv I_MPI_DEBUG 5 -genv LD_LIBRARY_PATH $LD_LIBRARY_PATH -n $num ./perform_test_pnetcdf $x_proc $y_proc $output &amp;amp;&amp;gt; $out[/bash]&lt;/P&gt;
&lt;P&gt;and got the new output.&lt;/P&gt;</description>
      <pubDate>Mon, 18 Mar 2013 02:08:00 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-IO-error-when-running-on-lustre-with-a-high-number-of/m-p/957896#M3115</guid>
      <dc:creator>Wencan_W_</dc:creator>
      <dc:date>2013-03-18T02:08:00Z</dc:date>
    </item>
  </channel>
</rss>

