- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear all,
I am trying to run my program in a cluster with 10 nodes and every node has Windows 7 64bit + Intel MPI 4.1.
I run my program by
mpiexec -n 12 test
or
mpiexec -wdir \\n01\mytest\ -hosts 10 n01 12 n02 12 n03 12 n04 12 n05 12 n06 12 n07 12 n08 12 n09 12 n10 12 \\n01\mytest\test
When ONLY ONE Build Environment window opened, both command line works. However, when two Build Environment windows opened, in one window the first command line still work but the second one failed with the following error message:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(659)......................:
MPID_Init(195).............................: channel initialization failed
MPIDI_CH3_Init(106)........................:
MPID_nem_tcp_post_init(344)................:
MPID_nem_newtcp_module_connpoll(3099)......:
recv_id_or_tmpvc_info_success_handler(1328): read from socket failed - No error
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(344)..........:
MPID_nem_newtcp_module_connpoll(3099):
gen_read_fail_handler(1194)..........: read from socket failed - The specified n
etwork name is no longer available.
Is there any bug in Intel MPI, or should I write any special code to let the program work on this condition?
Thanks,
Zhanghong Tang
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zhanghong,
Please compare the environment variables in the two windows using "set". Are you attempting to run in both simultaneously?
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Dr. Tullos,
Thank you very much for your kindly reply. I compared the environment variables in the two windows and found that the are exactly the same.
Yes. I need to test my program in both simultaneously (with different parameters). But now the problem happened when I opened two windows and run the program only in one window.
Thanks,
Zhanghong Tang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zhanghong,
I've looked over the verbose output you sent. I think there is a problem with the network path. Please try either mapping the network path to a local drive or make certain you have properly setup Active Directory*. See section 3.2.1 of the Intel® MPI Library for Windows* Reference Manual for details on how to setup Active Directory*.
Please let me know if this helps.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Dr. Tullos,
Thank you very much for your kindly reply. As the Intel® MPI Library for Windows* Reference Manual suggested, I downloaded the Remote Server Administration Tools from here:
http://www.microsoft.com/en-us/download/details.aspx?id=7887
and then installed, and then enabled 'Enable Active Directory Administrative Center ' according to here:
http://technet.microsoft.com/en-us/library/dd560652%28v=ws.10%29.aspx
but after that, I tried to 'Open the Computers list in the Active Directory Users and Computers administrative utility' as Intel® MPI Library for Windows* Reference Manual suggested, the errors displayed as attached picture.
I have windows 7 64bit installed in all nodes of the cluster and I found that I can't create a domain from windows 7 64bit system:
http://www.sevenforums.com/network-sharing/125929-how-create-domain-windows-7-a.html
What should I do next?
Thanks,
Zhanghong Tang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Dr. Tullos,
Thank you very much for your kindly reply.
I tried as your suggested, but failed. I can't 'Open the Computers list in the Active Directory Users and Computers administrative utility' instructed by manual since there is no domain in the cluster with Windows 7 64bit installed on every node. I searched from internet and some people said that we can't setup a domain on Windows 7 64bit system.
Do you have any suggestion? Can I setup the Active Directory in the cluster without a domain installed?
Thanks,
Zhanghong Tang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Dr. Tullos,
I also tried your another solution to map the network path to a local drive, it also failed (even for the local node). The error message is as follows:
forrtl: severe (29): file not found, unit 1, file C:\Windows\system32\parainfo\polesize.dat
My program will read some data from .\parainfo folder. The program works when input the network path as the working path.
Could you please help me to take a look at it?
Thanks,
Zhanghong Tang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zhanghong,
When you are running on the mapped drive, are you specifying a working directory? Is it the same on all of the systems?
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Dr. Tullos,
I mapped the network path as Z driver on all nodes and the command line to run the program is as follows:
mpiexec -wdir Z:\debug\directional -hosts 10 n01 4 n02 4 n03 4 n04 4 n05 4 n06 4 n07 4 n08 4 n09 4 n10 4 Z:\debug\directional\fem
or(single node, I have set current path to Z:\debug\directional):
mpiexec -n 4 fem
and then, the error message shows as I said before.
Thanks,
Zhanghong Tang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zhanghong,
Please try using the mpiexec option -mapall.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear James,
Thank you very much for your kindly reply. I tried to add the option -mapall and the results are similar to before, sometimes it works and when running next time, sometimes the following errors displayed:
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(659)......................:
MPID_Init(195).............................: channel initialization failed
MPIDI_CH3_Init(106)........................:
MPID_nem_tcp_post_init(344)................:
MPID_nem_newtcp_module_connpoll(3099)......:
recv_id_or_tmpvc_info_success_handler(1328): read from socket failed - No error
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(659)................:
MPID_Init(195).......................: channel initialization failed
MPIDI_CH3_Init(106)..................:
MPID_nem_tcp_post_init(344)..........:
MPID_nem_newtcp_module_connpoll(3099):
gen_read_fail_handler(1194)..........: read from socket failed - The specified n
etwork name is no longer available.
and sometimes the following errors displayed:
*********** Warning ************
Unable to map \\n01\Debug. (error 71)
*********** Warning ************
launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N09' failed, error
2 - The system cannot find the file specified.
launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N07' failed, error
2 - The system cannot find the file specified.
launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N02' failed, error
2 - The system cannot find the file specified.
*********** Warning ************
Unable to map \\n01\Debug. (error 71)
*********** Warning ************
forrtl: severe (29): file not found, unit 1, file C:\Windows\system32\parainfo\p
olesize.dat
Image PC Routine Line Source
fem.exe 000000014053DAE7 Unknown Unknown Unknown
fem.exe 00000001405390B3 Unknown Unknown Unknown
fem.exe 00000001404CB016 Unknown Unknown Unknown
fem.exe 00000001404A5635 Unknown Unknown Unknown
fem.exe 00000001404A4270 Unknown Unknown Unknown
fem.exe 0000000140482B39 Unknown Unknown Unknown
fem.exe 000000013FCAAEE7 READPOLE 41 readdata.f90
fem.exe 000000013FCC6C02 MAIN__ 14 main.f90
fem.exe 000000014172267C Unknown Unknown Unknown
fem.exe 0000000140510B37 Unknown Unknown Unknown
kernel32.dll 000000007738652D Unknown Unknown Unknown
ntdll.dll 00000000774BC521 Unknown Unknown Unknown
launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N04' failed, error
2 - The system cannot find the file specified.
launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N07' failed, error
2 - The system cannot find the file specified.
launch failed: CreateProcess(\\n01\Debug\directional\fem) on 'N04' failed, error
2 - The system cannot find the file specified.
*********** Warning ************
Unable to map \\n01\Debug. (error 71)
*********** Warning ************
After running
smpd -restart
and closed the MPI environment window and reopen it, it works again.
Could you please help me to take a look at it?
Thanks,
Zhanghong Tang
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Zhanghong,
Please check through your Windows* system logs and look for anything indicating a networking failure on the nodes.
Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page