- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello, Admin!
I'm now using Intel Cluster Studio Tool Kit! And I'm trying to run hybrid(mpi+openmp) program on 25 compute nodes!I compile my program using with -mt_mpi -openmp. I use I_MPI_DOMAIN=omp OMP_NUM_THREADS=2 environment variables, that means for every process(mpi) will have 2 threads(openmp). I can run my program without errors still using with 14 compute nodes! But beyond 14 compute nodes, error outputs is following!
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)......................:
MPID_Init(195).............................: channel initialization failed
MPIDI_CH3_Init(106)........................:
MPID_nem_tcp_post_init(344)................:
MPID_nem_newtcp_module_connpoll(3099)......:
recv_id_or_tmpvc_info_success_handler(1328): read from socket failed - No error
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(344)..........:
MPID_nem_newtcp_module_connpoll(3099):
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(344)..........:
MPID_nem_newtcp_module_connpoll(3099):
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(344)..........:
MPID_nem_newtcp_module_connpoll(3099):
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(344)..........:
MPID_nem_newtcp_module_connpoll(3099):
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(344)..........:
MPID_nem_newtcp_module_connpoll(3099):
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(337)..........:
MPID_nem_newtcp_module_connpoll(3099):
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(337)..........:
MPID_nem_newtcp_module_connpoll(3099):
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(337)..........:
MPID_nem_newtcp_module_connpoll(3113):
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(337)..........:
MPID_nem_newtcp_module_connpoll(3113):
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(337)..........:
MPID_nem_newtcp_module_connpoll(3113):
gen_read_fail_handler(1194)..........: read from socket failed - The specified network name is no longer available.
Fatal error in PMPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(337)..........:
MPID_nem_newtcp_module_connpoll(3113):
gen_read_fail_handler(1194)..........: read from socket failed - The specified
job aborted:
rank: node: exit code[: error message]
0: HPC-01: 1: process 0 exited without calling finalize
1: HPC-02: 123
2: HPC-03: 1: process 2 exited without calling finalize
3: HPC-04: 1: process 3 exited without calling finalize
4: HPC-05: 1: process 4 exited without calling finalize
5: HPC-06: 1: process 5 exited without calling finalize
6: HPC-07: 1: process 6 exited without calling finalize
7: HPC-08: 1: process 7 exited without calling finalize
8: HPC-09: 1: process 8 exited without calling finalize
9: HPC-10: 1: process 9 exited without calling finalize
10: HPC-11: 1: process 10 exited without calling finalize
11: HPC-12: 1: process 11 exited without calling finalize
12: HPC-13: 1: process 12 exited without calling finalize
13: HPC-14: 1: process 13 exited without calling finalize
14: HPC-16: 1: process 14 exited without calling finalize
15: HPC-17: 1: process 15 exited without calling finalize
network name is no longer available.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Are these dual core compute nodes ? What is total number of cores? If you wish to over subscribe you will need to disable i_mpi_pin_domain
do you have a reason for trying that mt library?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your reply, Tim Prince! Every compute node has 1 package/4 cores, and every core has 2 threads. I can run my program without errors still using with 14 compute nodes! But beyond 14 compute nodes, error outputs. I want to run my hybrid(mpi/openmp) program, and so I'm using mt library and I_MPI_PIN_DOMAIN environment variable! Can you suggest me that I will need to disable hyper-threading technology in BIOS, and that can cause error pinning threads within mpi process?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Pin_domain spreads the work across cores by default on a supported Intel CPU so it may not be necessary to disable ht. If you are trying an odd selection of ranks and threads, i_mpi_debug=5 may shed light on what happens.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Tim Prince! Unfortunately, your answer is not solution for my problem. I want to run my program with every core that has 2 threads. I wish you to understand why i used such environment variables I_MPI_PIN_DOMAIN=omp and OMP_NUM_THREADS=2. Above post I out my problem messages. Can you briefly discuss for that problems message?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page