I have 10 nodes name N01~N10. In these nodes two nodes installed Windows server 2012 and others are Windows 7 64bit. A domain fem.org was created in one node. All nodes have joined into this domain. However, the nodes that installed Windows 7 64bit version didn't log into as domain user, but as local user.
I used mpiexec 4.1.3.047 before and I usually run the command like following:
mpiexec -wdir "Z:\Debug\test" -mapall -hosts 10 n01 2 n02 2 n03 2 n04 2 n05 2 n06 2 n07 2 n08 2 n09 2 n10 2 Z:\Debug\fem
and it works.
Now I tried to use latest version of mpi by updating two nodes N01 and N10 (the windows 7 64bit OS is installed). After removed old version and installed latest version, I have the following command:
I used the local user instead of domain user register to mpiexec. However, when I run the program by
mpiexec -wdir "Z:\Debug\test" -mapall -hosts 2 n01 2 n10 2 Z:\Debug\fem
The following error message displayed:
Credentials for N01\tang rejected connecting to N10
Then I run
on these two nodes again and this time I use the domain user register, and launch fem by mpiexec again. This time it works.
My question is: How to let latest MPI support local user register?
Don't use the SMPD installation. Only use the Hydra installation. SMPD is deprecated.
Ensure that the login and password are consistent across the systems. Also, try running Remote Desktop from each system to each system.
Do you have firewalls on any of the systems? If so, I would suggest that you disable the firewall temporarily (best to disconnect from the internet while doing this). If this resolves the problem, then you will need to allow Hydra and your application through your firewall.
Intel Developer Support
Thank you very much for your kindly reply. As you suggested, I run the following command
on every node and run the program again. The same error displayed. I have turned off firewalls on every node. Currently I Remote Connected to all other nodes from N01, so the connection is OK.
Thank you very much for your kindly reply. I tested. Any two nodes works. Add to 3 or 4 nodes, some groups work, but others failed (the same error information).
Another fact is that when program runs OK with small number of hosts, for example, from
-hosts 1 N01 2, -hosts 2 N01 2 N02 2, ...
after 3 or 4 hosts the error happened, and then I reduce the number of hosts, but the error still happened even I reduced the host number to 1, and the hosts is just the local host. The problem can't be solved by reinstall hydra_service. I have to uninstall Intel MPI and reinstall it.
Thank you very much for your kindly reply. Is there any problem from the output file?
I checked the -localroot option:
launch the root process directly from mpiexec if the host is local (this allows the root process to create windows and be debugged)
but in my problem the hosts are not local. Is there anything I have misunderstood?
Ok, please ensure that the login and password are identical across all of the systems. I would recommend unregistering your credentials with MPI on all of the systems, and then only registering on the one launching the jobs.
Is the installation location the same across all nodes? You can also installed to a shared location and run from there.
If these don't work, please submit an issue in Intel® Premier Support (https://premier.intel.com) as this will require a more thorough investigation.
Thank you very much for your kindly reply. I ensure that the login and password are identical across all of the systems.
I will try your suggested method latter. But now I have to run my program by MPI 4.1.3.