Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
2159 Discussions

MPI doesn't work (Fatal error in MPI_Init)

Ivan_I_1
Beginner
776 Views

Hi,

I have the following problem:

I have two nodes and config file:

-n 1 -host node0 myapp
-n 1 -host node1 myapp

In this way it works fine. However If I change the order of lines in config to:

-n 1 -host node1 myapp
-n 1 -host node0 myapp

It fails with the error:

Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(658)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(104)..................:
MPID_nem_tcp_post_init(344)..........:
MPID_nem_newtcp_module_connpoll(3102):
gen_cnting_fail_handler(1816)........: connect failed - The semaphore timeout period has expired.
 (errno 121)

job aborted:
rank: node: exit code[: error message]
0: node1: 1: process 0 exited without calling finalize
1: node0: 123

What can be the reason for? Any ideas?

0 Kudos
1 Reply
James_T_Intel
Moderator
776 Views

Hi Ivan,

Are you able to ssh from node0 to node1 and from node1 to node0?  Do the IP addresses of the nodes resolve identically between each node?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

0 Kudos
Reply