Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Zhanghong_T_
Novice
137 Views

Question about mpiexec (5.1.3)

Dear all,

I have 10 nodes name N01~N10. In these nodes two nodes installed Windows server 2012  and others are Windows 7 64bit. A domain fem.org was created in one node. All nodes have joined into this domain. However, the nodes that installed Windows 7 64bit version didn't log into as domain user, but as local user.

I used mpiexec 4.1.3.047 before and I usually run the command like following:

mpiexec -wdir "Z:\Debug\test" -mapall -hosts 10 n01 2 n02 2 n03 2 n04 2 n05 2 n06 2 n07 2 n08 2 n09 2 n10 2 Z:\Debug\fem

and it works.

Now I tried to use latest version of mpi by updating two nodes N01 and N10 (the windows 7 64bit OS is installed). After removed old version and installed latest version, I have the following command:

hydra_service -install

smpd -install

mpiexec -remove

mpiexec -register

I used the local user instead of domain user register to mpiexec. However, when I run the program by

mpiexec -wdir "Z:\Debug\test" -mapall -hosts 2 n01 2 n10 2 Z:\Debug\fem

The following error message displayed:

Credentials for N01\tang rejected connecting to N10

 

Then I run

mpiexec -remove

and

mpiexec -register

on these two nodes again and this time I use the domain user register, and launch fem by mpiexec again. This time it works.

My question is: How to let latest MPI support local user register?

Thanks,

Zhanghong Tang

 

0 Kudos
15 Replies
Zhanghong_T_
Novice
137 Views

Dear all, Now I removed all old version MPIs and installed latest 5.1.3 on all nodes, and then run the program again. The following errors displayed:\ [mpiexec@N01] ..\hydra\pm\pmiserv\pmiserv_cb.c (781): connection to proxy 0 at host N03 failed [mpiexec@N01] ..\hydra\tools\demux\demux_select.c (103): callback returned error status [mpiexec@N01] ..\hydra\pm\pmiserv\pmiserv_pmci.c (500): error waiting for event [mpiexec@N01] ..\hydra\ui\mpich\mpiexec.c (1130): process manager error waiting for completion For many times running, every time the host name is different (for example, N03, N08, etc) The program runs OK when use MPI 4.1.3.047. What could lead to this problem? Is there any robust version after 4.1.3.047? Thanks
James_T_Intel
Moderator
137 Views

Don't use the SMPD installation.  Only use the Hydra installation.  SMPD is deprecated.

Ensure that the login and password are consistent across the systems.  Also, try running Remote Desktop from each system to each system.

Do you have firewalls on any of the systems?  If so, I would suggest that you disable the firewall temporarily (best to disconnect from the internet while doing this).  If this resolves the problem, then you will need to allow Hydra and your application through your firewall.

James.
Intel Developer Support

Zhanghong_T_
Novice
137 Views

Dear James,

Thank you very much for your kindly reply. As you suggested, I run the following command

smpd -remove

on every node and run the program again. The same error displayed. I have turned off firewalls on every node. Currently I Remote Connected to all other nodes from N01, so the connection is OK.

Thanks,

James_T_Intel
Moderator
137 Views

Ok, try running with two nodes at a time to see if there are individual nodes causing problems.

Zhanghong_T_
Novice
137 Views

Dear James,

Thank you very much for your kindly reply. I tested. Any two nodes works. Add to 3 or 4 nodes, some groups work, but others failed (the same error information).

Thanks

James_T_Intel
Moderator
137 Views

Please run a larger run with -verbose and attach the output as a file.

Zhanghong_T_
Novice
137 Views

Dear James,

Please see the attached output file.

Thanks

Zhanghong_T_
Novice
137 Views

Dear James,

Have you received the attachment? Is there any problem in my settings and commands to run the program by MPI 5.1.3?

Thanks

Zhanghong_T_
Novice
137 Views

Another fact is that when program runs OK with small number of hosts, for example, from

-hosts 1 N01 2, -hosts 2 N01 2 N02 2, ...

after 3 or 4 hosts the error happened, and then I reduce the number of hosts, but the error still happened even I reduced the host number to 1, and the hosts is just the local host. The problem can't be solved by reinstall hydra_service. I have to uninstall Intel MPI and reinstall it.

Thanks

James_T_Intel
Moderator
137 Views

Please try running with -localroot.

Zhanghong_T_
Novice
137 Views

Dear James,

Thank you very much for your kindly reply. Is there any problem from the output file?

I checked the -localroot option:

launch the root process directly from mpiexec if the host is local (this allows the root process to create windows and be debugged)

but in my problem the hosts are not local. Is there anything I have misunderstood?

Thanks

James_T_Intel
Moderator
137 Views

I see nothing obvious in the output.  I have seen odd errors under Windows resolved by using -localroot.

Zhanghong_T_
Novice
137 Views

Dear James,

Thank you very much for your kindly reply. However, I have tested the -localroot option but the same error displayed.

Thanks

James_T_Intel
Moderator
137 Views

Ok, please ensure that the login and password are identical across all of the systems.  I would recommend unregistering your credentials with MPI on all of the systems, and then only registering on the one launching the jobs.

Is the installation location the same across all nodes?  You can also installed to a shared location and run from there.

If these don't work, please submit an issue in Intel® Premier Support (https://premier.intel.com) as this will require a more thorough investigation.

Zhanghong_T_
Novice
137 Views

Dear James,

Thank you very much for your kindly reply. I ensure that the login and password are identical across all of the systems.

I will try your suggested method latter. But now I have to run my program by MPI 4.1.3.

Thanks

Reply