- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey Forum,
I'm trying to run on the Azure cloud using Intel's MPI implementation, but there is a problem. Everything works as expected when run on one Agent (8 processors), however anything with 2 or more Agents fails on MPI_Init() roughly 25% of the time. The failure is instantaneous (see output below). I was also able to reproduce this crash with a simple point to point send between all processors. I'm unable to reproduce the issue on my local system.
Are there any known issues with Intel MPI on Azure's virtual machines? Any idea on why it may crash on initialization only some of the time?
The current solution has been simply to use microsoft's MPI library, but I would really like to figure out what the source of the problem is.
Thank you kindly.
Error output:
Master Agent: 10 |
Information |
Secondary Agent: 2 |
Information |
Secondary Agent: 16 |
Information |
Secondary Agent: 12 |
Information |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
job aborted: |
Debug |
rank: node: exit code[: error message] |
Debug |
MPIR_Init_thread(658)......................: |
Error |
MPID_Init(195).............................: channel initialization failed |
Error |
0: Agent10: 123 |
Debug |
MPIDI_CH3_Init(104)........................: |
Error |
1: Agent10: 1 |
Debug |
2: Agent10: 1 |
Debug |
MPID_nem_tcp_post_init(345)................: |
Error |
MPID_nem_newtcp_module_connpoll(3102)......: |
Error |
3: Agent10: 1 |
Debug |
recv_id_or_tmpvc_info_success_handler(1330): read from socket failed - No error |
Error |
4: Agent10: 1: process 4 exited without calling finalize |
Debug |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
5: Agent10: 1: process 5 exited without calling finalize |
Debug |
6: Agent10: 1: process 6 exited without calling finalize |
Debug |
MPIR_Init_thread(658)................: |
Error |
7: Agent10: 1: process 7 exited without calling finalize |
Debug |
MPID_Init(195).......................: channel initialization failed |
Error |
MPIDI_CH3_Init(104)..................: |
Error |
8: Agent2: 123 |
Debug |
MPID_nem_tcp_post_init(345)..........: |
Error |
9: Agent2: 1: process 9 exited without calling finalize |
Debug |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
10: Agent2: 123 |
Debug |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
11: Agent2: 123 |
Debug |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
12: Agent2: 123 |
Debug |
MPIR_Init_thread(658)................: |
Error |
13: Agent2: 1: process 13 exited without calling finalize |
Debug |
MPID_Init(195).......................: channel initialization failed |
Error |
14: Agent2: 123 |
Debug |
MPIDI_CH3_Init(104)..................: |
Error |
15: Agent2: 123 |
Debug |
MPID_nem_tcp_post_init(345)..........: |
Error |
16: Agent12: 123 |
Debug |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
17: Agent12: 123 |
Debug |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
18: Agent12: 123 |
Debug |
19: Agent12: 123 |
Debug |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
MPIR_Init_thread(658)................: |
Error |
20: Agent12: 123 |
Debug |
MPID_Init(195).......................: channel initialization failed |
Error |
21: Agent12: 123 |
Debug |
MPIDI_CH3_Init(104)..................: |
Error |
22: Agent12: 123 |
Debug |
23: Agent12: 123 |
Debug |
MPID_nem_tcp_post_init(345)..........: |
Error |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
24: Agent16: 1: process 24 exited without calling finalize |
Debug |
25: Agent16: 1: process 25 exited without calling finalize |
Debug |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
26: Agent16: 1: process 26 exited without calling finalize |
Debug |
MPIR_Init_thread(658)................: |
Error |
27: Agent16: 1: process 27 exited without calling finalize |
Debug |
MPID_Init(195).......................: channel initialization failed |
Error |
28: Agent16: 1: process 28 exited without calling finalize |
Debug |
MPIDI_CH3_Init(104)..................: |
Error |
29: Agent16: 1: process 29 exited without calling finalize |
Debug |
MPID_nem_tcp_post_init(345)..........: |
Error |
30: Agent16: 1: process 30 exited without calling finalize |
Debug |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
31: Agent16: 1: process 31 exited without calling finalize |
Debug |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
MPIR_Init_thread(658)................: |
Error |
MPID_Init(195).......................: channel initialization failed |
Error |
MPIDI_CH3_Init(104)..................: |
Error |
MPID_nem_tcp_post_init(345)..........: |
Error |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
MPIR_Init_thread(658)................: |
Error |
MPID_Init(195).......................: channel initialization failed |
Error |
MPIDI_CH3_Init(104)..................: |
Error |
MPID_nem_tcp_post_init(345)..........: |
Error |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
MPIR_Init_thread(658)................: |
Error |
MPID_Init(195).......................: channel initialization failed |
Error |
MPIDI_CH3_Init(104)..................: |
Error |
MPID_nem_tcp_post_init(345)..........: |
Error |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
MPIR_Init_thread(658)................: |
Error |
MPID_Init(195).......................: channel initialization failed |
Error |
MPIDI_CH3_Init(104)..................: |
Error |
MPID_nem_tcp_post_init(345)..........: |
Error |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
MPIR_Init_thread(658)................: |
Error |
MPID_Init(195).......................: channel initialization failed |
Error |
MPIDI_CH3_Init(104)..................: |
Error |
MPID_nem_tcp_post_init(345)..........: |
Error |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
MPIR_Init_thread(658)................: |
Error |
MPID_Init(195).......................: channel initialization failed |
Error |
MPIDI_CH3_Init(104)..................: |
Error |
MPID_nem_tcp_post_init(345)..........: |
Error |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
MPIR_Init_thread(658)................: |
Error |
MPID_Init(195).......................: channel initialization failed |
Error |
MPIDI_CH3_Init(104)..................: |
Error |
MPID_nem_tcp_post_init(345)..........: |
Error |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
MPIR_Init_thread(658)................: |
Error |
MPID_Init(195).......................: channel initialization failed |
Error |
MPIDI_CH3_Init(104)..................: |
Error |
MPID_nem_tcp_post_init(345)..........: |
Error |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
MPIR_Init_thread(658)................: |
Error |
MPID_Init(195).......................: channel initialization failed |
Error |
MPIDI_CH3_Init(104)..................: |
Error |
MPID_nem_tcp_post_init(345)..........: |
Error |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
MPIR_Init_thread(658)................: |
Error |
MPID_Init(195).......................: channel initialization failed |
Error |
MPIDI_CH3_Init(104)..................: |
Error |
MPID_nem_tcp_post_init(345)..........: |
Error |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
Fatal error in MPI_Init: Other MPI error, error stack: |
Error |
MPIR_Init_thread(658)................: |
Error |
MPID_Init(195).......................: channel initialization failed |
Error |
MPIDI_CH3_Init(104)..................: |
Error |
MPID_nem_tcp_post_init(345)..........: |
Error |
MPID_nem_newtcp_module_connpoll(3102): |
Error |
gen_read_fail_handler(1196)..........: read from socket failed - The specified network name is no longer available. |
Error |
Fatal error in PMPI_Isend: Other MPI error, error stack: |
Error |
PMPI_Isend(161).................: MPI_Isend(buf=000000D8B7D7F89C, count=1, MPI_INT, dest=1, tag=0, MPI_COMM_WORLD, request=000000D8B7D7F8A0) failed |
Error |
MPIDI_CH3_EagerContigIsend(554).: failure occurred while attempting to send an eager message |
Error |
MPID_nem_newtcp_iSendContig(440): |
Error |
MPIU_SOCKW_Writev(454)..........: Unable to write to a socket, An existing connection was forcibly closed by the remote host. |
Error |
(errno 10054) |
Error |
Fatal error in PMPI_Isend: Other MPI error, error stack: |
Error |
PMPI_Isend(161).................: MPI_Isend(buf=0000001C452DFA5C, count=1, MPI_INT, dest=1, tag=0, MPI_COMM_WORLD, request=0000001C452DFA60) failed |
Error |
MPIDI_CH3_EagerContigIsend(554).: failure occurred while attempting to send an eager message |
Error |
MPID_nem_newtcp_iSendContig(440): |
Error |
MPIU_SOCKW_Writev(454)..........: Unable to write to a socket, An existing connection was forcibly closed by the remote host. |
Error |
(errno 10054) |
Error |
Link Copied

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page