Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
2275 Discussions

Why mpiexec.hydra waits long before to start proxy?

oleotiger
Novice
3,966 Views

Before mpiexec.hydra to start bootstrap, there is a long period to wait.

How I start:

export OMP_NUM_THREADS=4
export I_MPI_HYDRA_IFACE=ib0
export I_MPI_HYDRA_DEBUG=1
mpiexec.hydra  -hostfile /home/software/hostfiles/hostfile_128 -genvall  -ppn 12 ./wrf.exe

Then I wait about 3min, I get the following output:

[mpiexec@6248r-node128] Launch arguments: /opt/compiler/intel/oneapi/mpi/2021.2.0//bin//hydra_bstrap_proxy --upstream-host 9.9.9.128 --upstream-port 39744 --pgid 0 --launcher ssh --launcher-number 0 --base-path /opt/compiler/intel/oneapi/mpi/2021.2.0//bin/ --tree-width 16 --tree-level 1 --iface ib0 --time-left -1 --launch-type 2 --debug --proxy-id 0 --node-id 0 --subtree-size 1 --upstream-fd 7 /opt/compiler/intel/oneapi/mpi/2021.2.0//bin//hydra_pmi_proxy --usize -1 --auto-cleanup 1 --abort-signal 9

 

I don't know what mpiexec is doing before launch bootstrap proxy?

Why it wait so long to start it?

 

 

0 Kudos
7 Replies
oleotiger
Novice
3,958 Views

More information:

The hostfile /home/software/hostfiles/hostfile_128 just contains one host on which I run the command.

So I run a similar command which achieve the same result except remove hostfile:

mpiexec.hydra  -genvall  -ppn 48  ./wrf.exe

 

I get the normal output immediately.

 

It seems that mpiexec.hydra take a long time to resolve the hostfile. I don't know if it's right.

I have add '150.1.68.128 6248r-node128' in /etc/hosts.

0 Kudos
oleotiger
Novice
3,945 Views

I found the problem. But I don't think I find the best way to solve it.

When I delete the DNS in /etc/resolve, the app starts immediately.

So I think the delay is caused by hostname resolving of hosts in hostfile.

 

How should I disable remote DNS resolving? I just wanna to use local /etc/hosts to resolve the hostnames.

 

I found 'I_MPI_HYDRA_NAMESERVER = hostname:port'. But I don't know how to set it. hostname=localhost, what about the port?

0 Kudos
SantoshY_Intel
Moderator
3,927 Views

Hi,

 

Thanks for reaching out to us.

 

Could you please try running the same steps with & without "export I_MPI_HYDRA_IFACE=ib0" and share with us any change in behavior of the outcome?

We recommend you to use the below-like command for running your sample. since you are using a single node you can use "-n 12" instead of "-ppn 12". Use of '-genvall' flag is not required in this context.

Example:

mpiexec.hydra -hostfile <path-of-hostfile> -n 12 ./wrf.exe

After running the program, could you please share with us the output logs for both cases i.e with & without "export I_MPI_HYDRA_IFACE=ib0".

 

>>"I found 'I_MPI_HYDRA_NAMESERVER = hostname:port'. But I don't know how to set it. hostname=localhost, what about the port?"

Use the below commands:

hydra_nameserver &
I_MPI_HYDRA_NAMESERVER=`hostname` mpiexec.hydra -n 12 ./wrf.exe

 

And also could you please provide information about the job scheduler?

 

Thanks & Regards,

Santosh

 

0 Kudos
oleotiger
Novice
3,892 Views

I have resolved the problem. The delay is caused by nameserver.

Hosts in hostfile is resolved remotely by ips list in /etc/resolve. After I delete all DNS in /etc/resolve, the app starts immediately after pressing enter.

 

But I don't think deleting DNS ips in /etc/resolve is a good method to solve the problem.

 

I'm searching for a better way such as  when runing mpiexec.hydra I can disable remote DNS or disable /etc/resolve or explictly indicate to use local DNS (/etc/hostnames).

 

0 Kudos
SantoshY_Intel
Moderator
3,859 Views

Hi,


We are working on your issue and we will get back to you soon.


Thanks,

Santosh


0 Kudos
DrAmarpal_K_Intel
3,825 Views

Hi Oleotiger,


This issue is outside the scope of Intel MPI Library. Please enable higher priority on /etc/hosts than on the DNS resolver daemon using suitable methods for your OS distribution. Discussions of the following nature might help you,

https://unix.stackexchange.com/questions/499792/how-do-etc-hosts-and-dns-work-together-to-resolve-hostnames-to-ip-addresses


Is there anything else I can help you with?


Best regards,

Amar


0 Kudos
DrAmarpal_K_Intel
3,804 Views

This issue has been resolved and we will no longer respond to this thread. If you require additional assistance from Intel, please start a new thread. Any further interaction in this thread will be considered community only.

 

0 Kudos
Reply