<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Quick note regarding timing: in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112980#M5402</link>
    <description>&lt;P&gt;Quick note regarding timing: it transpires I was also capturing a call to MPI_Barrier - and it is this function that is responsible for the performance hit. Most probably there is some configuration that now is required?&lt;/P&gt;</description>
    <pubDate>Thu, 24 May 2018 07:17:54 GMT</pubDate>
    <dc:creator>Figura__Ed</dc:creator>
    <dc:date>2018-05-24T07:17:54Z</dc:date>
    <item>
      <title>MPI library crash when spawning &gt; 20 processes using MPI_COMM_SPAWN</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112968#M5390</link>
      <description>&lt;P&gt;I'm currently running MPI (5.03.048) on Windows 10 (64 bit) 8 core machine with 32GB RAM. I am using MPI_COMM_SPAWN from a C++ app (that is launched using mpiexec.exe -localonly -n 1) to spawn N MPI workers - actually, I call MPI_COMM_SPAWN N times each for a single worker (FT pattern). If I try to spawn 21 or more workers, I often get a crash from the MPI library itself. This is not consistent i.e. sometimes I can spawn 32 workers with no problems, sometimes I get a problem with 21. Has anyone else come across such a problem? Can anyone suggest what the issue might be?&lt;/P&gt;</description>
      <pubDate>Thu, 13 Oct 2016 10:20:57 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112968#M5390</guid>
      <dc:creator>Ed_F_1</dc:creator>
      <dc:date>2016-10-13T10:20:57Z</dc:date>
    </item>
    <item>
      <title>One more piece of information</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112969#M5391</link>
      <description>&lt;P&gt;One more piece of information. My app generates a dmp file. When I open this in Visual Studio, it wants to load impi_full.pdb which I don't have with my distribution. I do have impi.pdb - so I rename that temporarily - the Visual Studio debugger shows that the problem is in&amp;nbsp;MPIDI_CH3U_Handle_connection.&lt;/P&gt;</description>
      <pubDate>Fri, 14 Oct 2016 12:43:47 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112969#M5391</guid>
      <dc:creator>Ed_F_1</dc:creator>
      <dc:date>2016-10-14T12:43:47Z</dc:date>
    </item>
    <item>
      <title>I have just tried the 2017</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112970#M5392</link>
      <description>&lt;P&gt;I have just tried the 2017 version - same problem. Although now I am able to use SEH in my app to allow my program to run with however many workers were successfully spawned. One other thing I noticed is that the spawning itself appears to be much slower than in version 5&lt;/P&gt;</description>
      <pubDate>Mon, 17 Oct 2016 14:04:43 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112970#M5392</guid>
      <dc:creator>Ed_F_1</dc:creator>
      <dc:date>2016-10-17T14:04:43Z</dc:date>
    </item>
    <item>
      <title>Hello Ed,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112971#M5393</link>
      <description>&lt;P&gt;Hello Ed,&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; Could you provide the source of the code you are using and also the output after you set I_MPI_DEBUG=5?&lt;/P&gt;

&lt;P&gt;thanks&lt;/P&gt;

&lt;P&gt;Mark&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 19 Oct 2016 00:33:10 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112971#M5393</guid>
      <dc:creator>Mark_L_Intel</dc:creator>
      <dc:date>2016-10-19T00:33:10Z</dc:date>
    </item>
    <item>
      <title>Hi. I'm afraid I can't</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112972#M5394</link>
      <description>&lt;P&gt;Hi. I'm afraid I can't provide the actual source code. What I'm essentially doing in my main app is calling MPI_COMM_SPAWN inside a loop, as many times as required workers e.g. 32 times. Each invocation, sets maxProcs to 1; I also pass the working directory through MPI_Info. Inside the for loop I checked the return value of the spawn call; if successful, I send a couple of messages to the worker. Once the loop is finished, I then prepare to send other work to the workers. The worker is simple - it receives the expected couple of messages and then listens for the actual work. &amp;nbsp;The crash occurs during one of the calls to MPI_COMM_SPAWN; however, it does not always crash.&lt;/P&gt;

&lt;P&gt;Here is the exception message and stack trace of the most recent crash (from Visual Studio 2015):&lt;/P&gt;

&lt;P&gt;Unhandled exception at 0x00007FFD405EA1A3 (impi.dll) in gServerErr_161020-102908.dmp: 0xC0000005: Access violation reading location 0x0000000000000000.&lt;/P&gt;

&lt;P&gt;If there is a handler for this exception, the program may be safely continued.&lt;/P&gt;

&lt;P&gt;&amp;gt;&amp;nbsp;&amp;nbsp; &amp;nbsp;impi.dll!MPID_nem_newtcp_module_cleanup&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;impi.dll!MPID_nem_newtcp_module_cleanup&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;impi.dll!MPID_nem_newtcp_module_cleanup&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;impi.dll!MPIU_ExProcessCompletions&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;impi.dll!MPID_nem_newtcp_module_connpoll&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;impi.dll!MPID_nem_tcp_poll&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;impi.dll!00007ffd40786f38()&lt;/P&gt;

&lt;P&gt;&lt;SPAN style="font-size: 1em;"&gt;I have also set I_MPI_DEBUG=7 - here is the tail of diagnostic output:&lt;/SPAN&gt;&lt;/P&gt;

&lt;P&gt;STDOUT: [0] MPI startup(): Intel(R) MPI Library, Version 5.0 Update 3 &amp;nbsp;Build 20150128&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Copyright (C) 2003-2015 Intel Corporation. &amp;nbsp;All rights reserved.&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Multi-threaded optimized library&lt;BR /&gt;
	STDOUT: [0] MPI startup(): shm and tcp data transfer modes&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Internal info: pinning initialization was done&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Device_reset_idx=8&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Allgather: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Allgatherv: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Allreduce: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Alltoall: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Alltoallv: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Alltoallw: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Barrier: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Bcast: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Exscan: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Gather: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Gatherv: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Reduce_scatter: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Reduce: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Scan: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Scatter: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Scatterv: 0: 0-2147483647 &amp;amp; 0-2147483647&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Rank &amp;nbsp; &amp;nbsp;Pid &amp;nbsp; &amp;nbsp; &amp;nbsp;Node name &amp;nbsp;Pin cpu&lt;BR /&gt;
	STDOUT: [0] MPI startup(): 0 &amp;nbsp; &amp;nbsp; &amp;nbsp; 13516 &amp;nbsp; &amp;nbsp;PSEUK1207 &amp;nbsp;{0,1,2,3,4,5,6,7}&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Recognition=0 Platform(code=32 ippn=0 dev=1) Fabric(intra=1 inter=1 flags=0x0)&lt;BR /&gt;
	STDOUT: [0] MPI startup(): I_MPI_DEBUG=7&lt;BR /&gt;
	STDOUT: [0] MPI startup(): I_MPI_PIN_MAPPING=1:0 0&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Intel(R) MPI Library, Version 5.0 Update 3 &amp;nbsp;Build 20150128&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Copyright (C) 2003-2015 Intel Corporation. &amp;nbsp;All rights reserved.&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Multi-threaded optimized library&lt;BR /&gt;
	STDOUT: [0] MPI startup(): shm and tcp data transfer modes&lt;BR /&gt;
	STDOUT: [0] MPI startup(): Internal info: pinning initialization was done&lt;BR /&gt;
	STDERR: The following diagnostic file has been created: 'gServerErr_161020-102908.dmp'&lt;/P&gt;

&lt;P&gt;I never get a problem spawning up to 20 workers; but 21 and above produces these random crashes.&lt;/P&gt;

&lt;P&gt;Other notes:&lt;/P&gt;

&lt;UL&gt;
	&lt;LI&gt;I am using initialising MPI environment using the multiple threading level option&lt;/LI&gt;
	&lt;LI&gt;We are using Boost MPI as the wrapper; but as 1.55 does not wrap MPI_COMM_SPAWN we call this directly&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Thu, 20 Oct 2016 10:48:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112972#M5394</guid>
      <dc:creator>Ed_F_1</dc:creator>
      <dc:date>2016-10-20T10:48:09Z</dc:date>
    </item>
    <item>
      <title>Hello Ed,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112973#M5395</link>
      <description>&lt;P&gt;Hello Ed,&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&amp;nbsp; It sounds like some cleanup is not happening.&amp;nbsp;Do you mind to modify this code&lt;/P&gt;

&lt;P&gt;&lt;A href="http://mpi-forum.org/docs/mpi-2.0/mpi-20-html/node98.htm"&gt;http://mpi-forum.org/docs/mpi-2.0/mpi-20-html/node98.htm&lt;/A&gt;&lt;/P&gt;

&lt;P&gt;to your likening&amp;nbsp;(spawn &amp;gt; 20 workers, etc.), run it on your system and report back results?&amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Mark&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 20 Oct 2016 23:17:06 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112973#M5395</guid>
      <dc:creator>Mark_L_Intel</dc:creator>
      <dc:date>2016-10-20T23:17:06Z</dc:date>
    </item>
    <item>
      <title>Many thanks for the</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112974#M5396</link>
      <description>&lt;P&gt;Many thanks for the suggestion. I have taken that code and adapted it to my circumstances - I cannot get it to fail. The only other major difference is that in my app, in the master, I have a secondary thread running and in the worker I also have a secondary thread running - these threads fulfill different tasks. I'll see if I can modify the code to factor that in.&lt;/P&gt;</description>
      <pubDate>Fri, 21 Oct 2016 10:08:40 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112974#M5396</guid>
      <dc:creator>Ed_F_1</dc:creator>
      <dc:date>2016-10-21T10:08:40Z</dc:date>
    </item>
    <item>
      <title>Some more notes: I have taken</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112975#M5397</link>
      <description>&lt;P&gt;Some more notes: I have taken the MPI example code, changed the structure to match more closely my application and tested it - this works fine. From the other end, I have tried to simplify my application so that it in effect is only calling MPI_Comm_spawn() in a loop - the command-line, single thread version, still crashes once the loop hits its 21st iteration. I'm not sure what to try next. I don't understand if the problem is due to 'complexity' of the spawned worker app or complexity in the main program. One thing is for sure: the call to MPI_Comm_spawn() results in something inside impi.lib trying to de-reference a null pointer.&lt;/P&gt;</description>
      <pubDate>Mon, 24 Oct 2016 10:48:11 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112975#M5397</guid>
      <dc:creator>Ed_F_1</dc:creator>
      <dc:date>2016-10-24T10:48:11Z</dc:date>
    </item>
    <item>
      <title>I have to apologise for these</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112976#M5398</link>
      <description>&lt;P&gt;I have to apologise for these little snippets of news. I am running my app through a Cygwin shell. As I mentioned earlier, this crashes reproducibly if I attempt to run with &amp;gt; 21 workers. However, by simply setting I_MPI_DEBUG to 10 (i.e. export I_MPI_DEBUG=10), my program runs without any problems. In fact if I set I_MPI_DEBG to &amp;lt; 5 I still get a crash; setting I_MPI_DEBUG &amp;gt;= 5 ---- no crash! &amp;nbsp;If anyone has any ideas, that would be greatly appreciated.&lt;/P&gt;</description>
      <pubDate>Mon, 24 Oct 2016 11:03:59 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112976#M5398</guid>
      <dc:creator>Ed_F_1</dc:creator>
      <dc:date>2016-10-24T11:03:59Z</dc:date>
    </item>
    <item>
      <title>Hallo Ed,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112977#M5399</link>
      <description>&lt;P&gt;Hallo Ed,&lt;/P&gt;

&lt;P&gt;We’re currently facing the same issue with our application. The controller starts multiple workers using MPI_Comm_spawn. Most often, when the 21&lt;SUP&gt;st&lt;/SUP&gt; worker is started, the controller crashes in impi.dll. At that moment, the newly created worker is busy calling MPI_Init_thread.&lt;/P&gt;

&lt;P&gt;The crash is : Exception thrown at 0x00007FFA5EB4A013 (impi.dll) in controller.exe: 0xC0000005: Access violation reading location 0x0000000000000000.&lt;/P&gt;

&lt;P&gt;If we replace the multithreaded release version of impi.dll by the multithreaded debug version, the problem does not occur.&lt;/P&gt;

&lt;P&gt;We are using MPI version 2018.0.2.0 on Windows.&lt;/P&gt;

&lt;P&gt;Have you found the cause/fix for this issue ?&lt;/P&gt;

&lt;P&gt;Thanks,&lt;/P&gt;

&lt;P&gt;Mark&lt;/P&gt;</description>
      <pubDate>Wed, 09 May 2018 11:34:36 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112977#M5399</guid>
      <dc:creator>MarkV</dc:creator>
      <dc:date>2018-05-09T11:34:36Z</dc:date>
    </item>
    <item>
      <title>Hi Mark,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112978#M5400</link>
      <description>&lt;P&gt;Hi Mark,&lt;/P&gt;

&lt;P&gt;Thanks for that info. I'm trying to get version 18. I have tried 17 but still have the same problems with that. And I still do not know the cause!&lt;/P&gt;

&lt;P&gt;Ed&lt;/P&gt;</description>
      <pubDate>Tue, 22 May 2018 14:39:41 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112978#M5400</guid>
      <dc:creator>Figura__Ed</dc:creator>
      <dc:date>2018-05-22T14:39:41Z</dc:date>
    </item>
    <item>
      <title>Further update: I have tried</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112979#M5401</link>
      <description>&lt;P&gt;Further update: I have tried 2018 Update 8 (on Windows 10). First thing: it is about 20x slower than version 5.0.3 in spawning worker processes. My application use MPI as-is out-of-box. I call MPI_Comm_spawn() and tell it to spawn 1 child process - this is done in a loop N times. Perhaps there are some configuration variables that need tweaking but this came as a shock. Secondly, I appear to observe the same behaviour as Mark: i.e. with the debug DLLs I can spawn 32 worker processes (albeit rather slowly). I have not tested exhaustively whether this always works but with the release DLL, I got quite a few failures.&lt;/P&gt;

&lt;P&gt;Unfortunately, I need to get to the bottom of the performance hit before considering switching to this version.&lt;/P&gt;

&lt;P&gt;I also have the option of launching my app in MPMD mode. This works with any number of workers without a problem. It is extremely fast with 5.0.3 and quite the opposite with 2018 Update 2.&lt;/P&gt;</description>
      <pubDate>Wed, 23 May 2018 13:17:02 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112979#M5401</guid>
      <dc:creator>Figura__Ed</dc:creator>
      <dc:date>2018-05-23T13:17:02Z</dc:date>
    </item>
    <item>
      <title>Quick note regarding timing:</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112980#M5402</link>
      <description>&lt;P&gt;Quick note regarding timing: it transpires I was also capturing a call to MPI_Barrier - and it is this function that is responsible for the performance hit. Most probably there is some configuration that now is required?&lt;/P&gt;</description>
      <pubDate>Thu, 24 May 2018 07:17:54 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/MPI-library-crash-when-spawning-gt-20-processes-using-MPI-COMM/m-p/1112980#M5402</guid>
      <dc:creator>Figura__Ed</dc:creator>
      <dc:date>2018-05-24T07:17:54Z</dc:date>
    </item>
  </channel>
</rss>

