<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Hello Again, in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/Notification-of-a-failed-dead-node-existence-using-the-PSM2/m-p/1170179#M6536</link>
    <description>&lt;P&gt;Hello Again,&lt;/P&gt;

&lt;P&gt;I think I may give more detailed information on the topic.&lt;/P&gt;

&lt;P&gt;In the documentation I have read that the following function:&lt;/P&gt;

&lt;P&gt;psm2_error_register_handler&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Allows one of the three options: If I understand correctly, either not to use the handler in PSM2_ERRHANDLER_NO_HANDLER (and subsequently to read the errors from returned values from PSM2 function invocations), to defer error handling in PSM2_ERRHANDLER_PSM_HANDLER, OR, to use a user defined function.&lt;/P&gt;

&lt;P&gt;So my question are the folliwing:&lt;/P&gt;

&lt;P&gt;1. Does the psm2_poll function can return other errors than presented in the previous mail (such as connection failure). Un this case, I could simply check&lt;/P&gt;

&lt;P&gt;2. How can I define my own handler - unfortunately I did not see any example application of introducing user-defined handler so a code sample would be welcome. I assume I will be needing special handle for broken connection (and errors such as&amp;nbsp;PSM2_EP_WAS_CLOSED or&amp;nbsp;PSM2_EP_UNIT_NOT_FOUND or others ) - how to do that ?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Best Regards&lt;/P&gt;</description>
    <pubDate>Mon, 21 Aug 2017 12:39:51 GMT</pubDate>
    <dc:creator>RKraw</dc:creator>
    <dc:date>2017-08-21T12:39:51Z</dc:date>
    <item>
      <title>Notification of a failed dead node existence using the PSM2</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Notification-of-a-failed-dead-node-existence-using-the-PSM2/m-p/1170178#M6535</link>
      <description>&lt;P&gt;Hello everyone,&lt;/P&gt;

&lt;P&gt;I am writing because I am currently implementing a failure recovery system for a cluster with Intel OmniPath that will be designated for handling computations in a physical experiment. What I want to implement is a mechanism to detect a node that failed and to notify rest of the nodes. I tried to check the node failure by invoking psm2_poll. Unfortunately, as I saw in the&amp;nbsp;Intel ® Performance ScaledMessaging 2 (PSM2) Programmer’s Guide, this function does not return errors (values) other than OK or OK_NO_PROGRESS (this is at least what I have observed in my application - the poll on a dead node behaves as if the node did not fail/disconnect and did not send any message).&amp;nbsp;&lt;/P&gt;

&lt;P&gt;So the question is: What are the methods of notifying other nodes after node failure ? Is there a lightweight function that I can invoke along with poll to check if the node from whom I am trying to get messages exists ?&lt;/P&gt;

&lt;P&gt;In worst case, I can implement this using a counter and a timeout, but if there is a mechanism supported by the API, I am wide open.&lt;/P&gt;

&lt;P&gt;Best Regards&lt;/P&gt;</description>
      <pubDate>Mon, 21 Aug 2017 08:50:09 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Notification-of-a-failed-dead-node-existence-using-the-PSM2/m-p/1170178#M6535</guid>
      <dc:creator>RKraw</dc:creator>
      <dc:date>2017-08-21T08:50:09Z</dc:date>
    </item>
    <item>
      <title>Hello Again,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Notification-of-a-failed-dead-node-existence-using-the-PSM2/m-p/1170179#M6536</link>
      <description>&lt;P&gt;Hello Again,&lt;/P&gt;

&lt;P&gt;I think I may give more detailed information on the topic.&lt;/P&gt;

&lt;P&gt;In the documentation I have read that the following function:&lt;/P&gt;

&lt;P&gt;psm2_error_register_handler&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Allows one of the three options: If I understand correctly, either not to use the handler in PSM2_ERRHANDLER_NO_HANDLER (and subsequently to read the errors from returned values from PSM2 function invocations), to defer error handling in PSM2_ERRHANDLER_PSM_HANDLER, OR, to use a user defined function.&lt;/P&gt;

&lt;P&gt;So my question are the folliwing:&lt;/P&gt;

&lt;P&gt;1. Does the psm2_poll function can return other errors than presented in the previous mail (such as connection failure). Un this case, I could simply check&lt;/P&gt;

&lt;P&gt;2. How can I define my own handler - unfortunately I did not see any example application of introducing user-defined handler so a code sample would be welcome. I assume I will be needing special handle for broken connection (and errors such as&amp;nbsp;PSM2_EP_WAS_CLOSED or&amp;nbsp;PSM2_EP_UNIT_NOT_FOUND or others ) - how to do that ?&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Best Regards&lt;/P&gt;</description>
      <pubDate>Mon, 21 Aug 2017 12:39:51 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Notification-of-a-failed-dead-node-existence-using-the-PSM2/m-p/1170179#M6536</guid>
      <dc:creator>RKraw</dc:creator>
      <dc:date>2017-08-21T12:39:51Z</dc:date>
    </item>
    <item>
      <title>Hello Again,</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Notification-of-a-failed-dead-node-existence-using-the-PSM2/m-p/1170180#M6537</link>
      <description>&lt;P&gt;Hello Again,&lt;/P&gt;

&lt;P&gt;So this is what I figured out:&lt;/P&gt;

&lt;P&gt;1. I am defining my own handler in such a form (I have taken it fromm a compiler errors than I tried to register a handler):&lt;/P&gt;

&lt;P&gt;psm2_error myErrorFunc( psm2_ep* ep, psm2_error err, const char* achar, psm2_error_token*&amp;nbsp; token)&lt;BR /&gt;
	{&lt;/P&gt;

&lt;P&gt;// body&lt;BR /&gt;
	&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; return err;&lt;BR /&gt;
	}&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;2. I am registering it as follows: &amp;nbsp;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;psm2_error_register_handler(NULL, &amp;amp;myErrorFunc);&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;

&lt;P&gt;Now, the 2 following questions are&lt;/P&gt;

&lt;P&gt;- the 4 parameters of the handler - what they stand for ? I want to retrieve information which remote node failed to update my own communication data.&lt;/P&gt;

&lt;P&gt;- How to make a handler be called upon disconnection of a node or any connection failure of a remote node ?&lt;/P&gt;

&lt;P&gt;Best Regards&lt;/P&gt;

&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 21 Aug 2017 13:08:15 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Notification-of-a-failed-dead-node-existence-using-the-PSM2/m-p/1170180#M6537</guid>
      <dc:creator>RKraw</dc:creator>
      <dc:date>2017-08-21T13:08:15Z</dc:date>
    </item>
  </channel>
</rss>

