- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Hi,
I'm implementing techniques for fault tolerance using the Intel MPI.
I have the following scenario: two hosts (A and B) communicate via MPI messages.
Host B has a failure (crash or loss of link, for example). Host A that was trying to send a message to host B, can not complete the deployment,. There is the closure of the application about 15 minutes later, because the generation of a INTERNAL_ERRO. This occurs because of the failure of the various attempts to send TCP defined (these attempts are defined tcp_retry2).
The same procedure performed in OpenMPI does not have the same fate, ie the application is not interrupted.
Is there any way to disable the issuance of this bug in Intel MPI?
More clearly, disable the generation of INTERNAL_ERRO due to not completing the post even after several attempts the TCP layer defined tcp_retry?
Thank's
Alexandre D.Gonalves
I'm implementing techniques for fault tolerance using the Intel MPI.
I have the following scenario: two hosts (A and B) communicate via MPI messages.
Host B has a failure (crash or loss of link, for example). Host A that was trying to send a message to host B, can not complete the deployment,. There is the closure of the application about 15 minutes later, because the generation of a INTERNAL_ERRO. This occurs because of the failure of the various attempts to send TCP defined (these attempts are defined tcp_retry2).
The same procedure performed in OpenMPI does not have the same fate, ie the application is not interrupted.
Is there any way to disable the issuance of this bug in Intel MPI?
More clearly, disable the generation of INTERNAL_ERRO due to not completing the post even after several attempts the TCP layer defined tcp_retry?
Thank's
Alexandre D.Gonalves
Link copiado
1 Responder
- Marcar como novo
- Marcador
- Subscrever
- Silenciar
- Subscrever fonte RSS
- Destacar
- Imprimir
- Denunciar conteúdo inapropriado
Hi Alexandre,
You can try to set I_MPI_FAULT_CONTINUE=on:
$ mpiexec -env I_MPI_FAULT_CONTINUE on -n 2 ./test
Regards!
Dmitry
You can try to set I_MPI_FAULT_CONTINUE=on:
$ mpiexec -env I_MPI_FAULT_CONTINUE on -n 2 ./test
Regards!
Dmitry

Responder
Opções do tópico
- Subscrever fonte RSS
- Marcar tópico como novo
- Marcar tópico como lido
- Flutuar este Tópico para o utilizador atual
- Marcador
- Subscrever
- Página amigável para impressora