Intel® oneAPI HPC Toolkit
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
This community is designed for sharing of public information. Please do not share Intel or third-party confidential information here.
1912 Discussions

Why mpiexec.hydra waits long before to start proxy?

oleotiger
Novice
807 Views

Before mpiexec.hydra to start bootstrap, there is a long period to wait.

How I start:

export OMP_NUM_THREADS=4
export I_MPI_HYDRA_IFACE=ib0
export I_MPI_HYDRA_DEBUG=1
mpiexec.hydra  -hostfile /home/software/hostfiles/hostfile_128 -genvall  -ppn 12 ./wrf.exe

Then I wait about 3min, I get the following output:

[mpiexec@6248r-node128] Launch arguments: /opt/compiler/intel/oneapi/mpi/2021.2.0//bin//hydra_bstrap_proxy --upstream-host 9.9.9.128 --upstream-port 39744 --pgid 0 --launcher ssh --launcher-number 0 --base-path /opt/compiler/intel/oneapi/mpi/2021.2.0//bin/ --tree-width 16 --tree-level 1 --iface ib0 --time-left -1 --launch-type 2 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /opt/compiler/intel/oneapi/mpi/2021.2.0//bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9

 

I don't know what mpiexec is doing before launch bootstrap proxy?

Why it wait so long to start it?

 

 

0 Kudos
7 Replies
oleotiger
Novice
799 Views

More information:

The hostfile /home/software/hostfiles/hostfile_128 just contains one host on which I run the command.

So I run a similar command which achieve the same result except remove hostfile:

mpiexec.hydra  -genvall  -ppn 48  ./wrf.exe

 

I get the normal output immediately.

 

It seems that mpiexec.hydra take a long time to resolve the hostfile. I don't know if it's right.

I have add '150.1.68.128 6248r-node128' in /etc/hosts.

oleotiger
Novice
786 Views

I found the problem. But I don't think I find the best way to solve it.

When I delete the DNS in /etc/resolve, the app starts immediately.

So I think the delay is caused by hostname resolving of hosts in hostfile.

 

How should I disable remote DNS resolving? I just wanna to use local /etc/hosts to resolve the hostnames.

 

I found 'I_MPI_HYDRA_NAMESERVER = hostname:port'. But I don't know how to set it. hostname=localhost, what about the port?

SantoshY_Intel
Moderator
768 Views

Hi,

 

Thanks for reaching out to us.

 

Could you please try running the same steps with & without "export I_MPI_HYDRA_IFACE=ib0" and share with us any change in behavior of the outcome?

We recommend you to use the below-like command for running your sample. since you are using a single node you can use "-n 12" instead of "-ppn 12". Use of '-genvall' flag is not required in this context.

Example:

mpiexec.hydra -hostfile <path-of-hostfile> -n 12 ./wrf.exe

After running the program, could you please share with us the output logs for both cases i.e with & without "export I_MPI_HYDRA_IFACE=ib0".

 

>>"I found 'I_MPI_HYDRA_NAMESERVER = hostname:port'. But I don't know how to set it. hostname=localhost, what about the port?"

Use the below commands:

hydra_nameserver &
I_MPI_HYDRA_NAMESERVER=`hostname` mpiexec.hydra -n 12 ./wrf.exe

 

And also could you please provide information about the job scheduler?

 

Thanks & Regards,

Santosh

 

oleotiger
Novice
733 Views

I have resolved the problem. The delay is caused by nameserver.

Hosts in hostfile is resolved remotely by ips list in /etc/resolve. After I delete all DNS in /etc/resolve, the app starts immediately after pressing enter.

 

But I don't think deleting DNS ips in /etc/resolve is a good method to solve the problem.

 

I'm searching for a better way such as  when runing mpiexec.hydra I can disable remote DNS or disable /etc/resolve or explictly indicate to use local DNS (/etc/hostnames).

 

SantoshY_Intel
Moderator
700 Views

Hi,


We are working on your issue and we will get back to you soon.


Thanks,

Santosh


DrAmarpal_K_Intel
666 Views

Hi Oleotiger,


This issue is outside the scope of Intel MPI Library. Please enable higher priority on /etc/hosts than on the DNS resolver daemon using suitable methods for your OS distribution. Discussions of the following nature might help you,

https://unix.stackexchange.com/questions/499792/how-do-etc-hosts-and-dns-work-together-to-resolve-ho...


Is there anything else I can help you with?


Best regards,

Amar


DrAmarpal_K_Intel
645 Views

This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

 

Reply