Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Ivan_I_1
Beginner
110 Views

MPI doesn't work (Fatal error in MPI_Init)

Hi,

I have the following problem:

I have two nodes and config file:

-n 1 -host node0 myapp
-n 1 -host node1 myapp

In this way it works fine. However If I change the order of lines in config to:

-n 1 -host node1 myapp
-n 1 -host node0 myapp

It fails with the error:

Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(658)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(104)..................:
MPID_nem_tcp_post_init(344)..........:
MPID_nem_newtcp_module_connpoll(3102):
gen_cnting_fail_handler(1816)........: connect failed - The semaphore timeout period has expired.
 (errno 121)

job aborted:
rank: node: exit code[: error message]
0: node1: 1: process 0 exited without calling finalize
1: node0: 123

What can be the reason for? Any ideas?

0 Kudos
1 Reply
James_T_Intel
Moderator
110 Views

Hi Ivan,

Are you able to ssh from node0 to node1 and from node1 to node0?  Do the IP addresses of the nodes resolve identically between each node?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Reply