- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm implementing techniques for fault tolerance using the Intel MPI.
I have the following scenario: two hosts (A and B) communicate via MPI messages.
Host B has a failure (crash or loss of link, for example). Host A that was trying to send a message to host B, can not complete the deployment,. There is the closure of the application about 15 minutes later, because the generation of a INTERNAL_ERRO. This occurs because of the failure of the various attempts to send TCP defined (these attempts are defined tcp_retry2).
The same procedure performed in OpenMPI does not have the same fate, ie the application is not interrupted.
Is there any way to disable the issuance of this bug in Intel MPI?
More clearly, disable the generation of INTERNAL_ERRO due to not completing the post even after several attempts the TCP layer defined tcp_retry?
Thank's
Alexandre D.Gonalves
I'm implementing techniques for fault tolerance using the Intel MPI.
I have the following scenario: two hosts (A and B) communicate via MPI messages.
Host B has a failure (crash or loss of link, for example). Host A that was trying to send a message to host B, can not complete the deployment,. There is the closure of the application about 15 minutes later, because the generation of a INTERNAL_ERRO. This occurs because of the failure of the various attempts to send TCP defined (these attempts are defined tcp_retry2).
The same procedure performed in OpenMPI does not have the same fate, ie the application is not interrupted.
Is there any way to disable the issuance of this bug in Intel MPI?
More clearly, disable the generation of INTERNAL_ERRO due to not completing the post even after several attempts the TCP layer defined tcp_retry?
Thank's
Alexandre D.Gonalves
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Alexandre,
You can try to set I_MPI_FAULT_CONTINUE=on:
$ mpiexec -env I_MPI_FAULT_CONTINUE on -n 2 ./test
Regards!
Dmitry
You can try to set I_MPI_FAULT_CONTINUE=on:
$ mpiexec -env I_MPI_FAULT_CONTINUE on -n 2 ./test
Regards!
Dmitry

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page