Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Bad file descriptor for 80+ ranks on single host

Pieter_V_1
Beginner
1,862 Views

Hi,

I am unable to launch a simple MPI application using more than 80 processes on a single host using Intel MPI 2018 update 1 and PBS Pro as job scheduler.

The job is submitted with a script containing:

#PBS -l select=81:ncpus=1
mpiexec.hydra -n 81 -ppn 1 ./a.out

In the call to MPI_Init, the following error is raised on rank 80:

[cli_80]: write_line error; fd=255 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
system msg for write_line failure : Bad file descriptor
[cli_80]: Unable to write to PMI_fd
[cli_80]: write_line error; fd=255 buf=:cmd=barrier_in
:
system msg for write_line failure : Bad file descriptor
[cli_80]: write_line error; fd=255 buf=:cmd=get_ranks2hosts
:
system msg for write_line failure : Bad file descriptor
[cli_80]: expecting cmd="put_ranks2hosts", got cmd=""
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(805): fail failed
MPID_Init(1743)......: channel initialization failed
MPID_Init(2144)......: PMI_Init returned -1
[cli_80]: write_line error; fd=255 buf=:cmd=abort exitcode=68204815
:
system msg for write_line failure : Bad file descriptor

I looked closer into the issue by running the application through strace. The output for rank 80 shows that the process tries to read from the bash internal file descriptor 255:

<snip>
uname({sys="Linux", node="uvtk", ...})  = 0
sched_getaffinity(0, 128,  { 0, 0, 0, 0, 80000, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }) = 128
write(255, "cmd=init pmi_version=1 pmi_subve"..., 40) = -1 EBADF (Bad file descriptor)
write(2, "[cli_80]: ", 10[cli_80]: )              = 10
write(2, "write_line error; fd=255 buf=:cm"..., 72write_line error; fd=255 buf=:cmd=init pmi_version=1 pmi_subversion=1
:
) = 72
write(2, "system msg for write_line failur"..., 56system msg for write_line failure : Bad file descriptor
) = 56
write(2, "[cli_80]: ", 10[cli_80]: )              = 10
write(2, "Unable to write to PMI_fd\n", 26Unable to write to PMI_fd
) = 26
uname({sys="Linux", node="uvtk", ...})  = 0
write(255, "cmd=barrier_in\n", 15)      = -1 EBADF (Bad file descriptor)
write(2, "[cli_80]: ", 10[cli_80]: )              = 10
write(2, "write_line error; fd=255 buf=:cm"..., 47write_line error; fd=255 buf=:cmd=barrier_in
:
) = 47
write(2, "system msg for write_line failur"..., 56system msg for write_line failure : Bad file descriptor
) = 56
write(255, "cmd=get_ranks2hosts\n", 20) = -1 EBADF (Bad file descriptor)
write(2, "[cli_80]: ", 10[cli_80]: )              = 10
write(2, "write_line error; fd=255 buf=:cm"..., 52write_line error; fd=255 buf=:cmd=get_ranks2hosts
:
) = 52
write(2, "system msg for write_line failur"..., 56system msg for write_line failure : Bad file descriptor
) = 56
read(255, 0x7fffe00d6320, 1023)         = -1 EBADF (Bad file descriptor)
write(2, "[cli_80]: ", 10[cli_80]: )              = 10
write(2, "expecting cmd=\"put_ranks2hosts\","..., 44expecting cmd="put_ranks2hosts", got cmd=""
) = 44
write(2, "Fatal error in MPI_Init: Other M"..., 187Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(805): fail failed
MPID_Init(1743)......: channel initialization failed
MPID_Init(2144)......: PMI_Init returned -1
) = 187
write(255, "cmd=abort exitcode=68204815\n", 28) = -1 EBADF (Bad file descriptor)
write(2, "[cli_80]: ", 10[cli_80]: )              = 10
write(2, "write_line error; fd=255 buf=:cm"..., 60write_line error; fd=255 buf=:cmd=abort exitcode=68204815
:
) = 60
write(2, "system msg for write_line failur"..., 56system msg for write_line failure : Bad file descriptor
) = 56
exit_group(68204815)                    = ?

All other ranks communicate with pmi_proxy via a valid file descriptor. For example:

<snip>
uname({sys="Linux", node="uvtk", ...})  = 0
sched_getaffinity(0, 128,  { 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }) = 128
write(16, "cmd=init pmi_version=1 pmi_subve"..., 40) = 40
read(16, "cmd=response_to_init pmi_version"..., 1023) = 57
write(16, "cmd=get_maxes\n", 14)        = 14
read(16, "cmd=maxes kvsname_max=256 keylen"..., 1023) = 56
uname({sys="Linux", node="uvtk", ...})  = 0
write(16, "cmd=barrier_in\n", 15)       = 15
read(16,  <unfinished ...>

Is it possible to specify  a list of available file descriptors used by MPI processes or any other way to circumvent this behavior?

Regards,
Pieter
 

0 Kudos
3 Replies
Maksim_B_Intel
Employee
1,862 Views

Hi, Pieter. Do you see similar behaviour with less processes?

Can you try with 2019 branch of Intel(r) MPI?

0 Kudos
Pieter_V_1
Beginner
1,862 Views

Hi Maksim,

The problem does not occur with 80 processes or less.

I can try with 2019, but this will take a while as the problem occurs at a customer site.

Regards,
Pieter

0 Kudos
McCalpinJohn
Honored Contributor III
1,862 Views

Be sure to check the shell ulimit values for open file descriptors: "ulimit -a" or "ulimit -n".

0 Kudos
Reply