- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have the following problem:
I have two nodes and config file:
-n 1 -host node0 myapp -n 1 -host node1 myapp
In this way it works fine. However If I change the order of lines in config to:
-n 1 -host node1 myapp -n 1 -host node0 myapp
It fails with the error:
Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(658)................: MPID_Init(195).......................: channel initialization failed MPIDI_CH3_Init(104)..................: MPID_nem_tcp_post_init(344)..........: MPID_nem_newtcp_module_connpoll(3102): gen_cnting_fail_handler(1816)........: connect failed - The semaphore timeout period has expired. (errno 121) job aborted: rank: node: exit code[: error message] 0: node1: 1: process 0 exited without calling finalize 1: node0: 123
What can be the reason for? Any ideas?
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ivan,
Are you able to ssh from node0 to node1 and from node1 to node0? Do the IP addresses of the nodes resolve identically between each node?
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page