Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Norris__Raymond
Beginner
166 Views

IMPI w/ Slurm

I'm working at a site configured with IMPI (2016.4.072) / Slurm (17.11.4).  The MpiDefault is none.

When I run my MPICH2 code (defaulting to --mpi=none)

     srun -N 2 -n 4 -l -vv ...

I get (trimming out duplicate error messages from other ranks)

0: PMII_singinit: execv failed: No such file or directory
0: [unset]:   This singleton init program attempted to access some feature
0: [unset]:   for which process manager support was required, e.g. spawn or universe_size.
0: [unset]:   But the necessary mpiexec is not in your path.
0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18014_0 key=P2-hostname
0: :
0: system msg for write_line failure : Bad file descriptor
0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18014_0 key=P3-hostname
0: :
0: system msg for write_line failure : Bad file descriptor
0: 2018-05-25 09:00:14  2: MPI startup(): Multi-threaded optimized library
0: 2018-05-25 09:00:14  2: DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
0: 2018-05-25 09:00:14  2: MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
0: 2018-05-25 09:00:14  2: MPI startup(): shm and dapl data transfer modes
0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18417_0 key=P1-businesscard-0
0: :
0: system msg for write_line failure : Bad file descriptor
0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=foobar key=foobar
0: :
0: system msg for write_line failure : Bad file descriptor
0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18417_0 key=P1-businesscard-0
0: :
0: system msg for write_line failure : Bad file descriptor
0: Fatal error in PMPI_Init_thread: Other MPI error, error stack:
0: MPIR_Init_thread(784).................:
0: MPID_Init(1332).......................: channel initialization failed
0: MPIDI_CH3_Init(141)...................:
0: dapl_rc_setup_all_connections_20(1388): generic failure with errno = 872614415
0: getConnInfoKVS(849)...................: PMI_KVS_Get failed
 
If I run the same code with
 
   srun --mpi=pmi2 ...
 
it works fine.
 
A couple of questions/comments:
1. In neither case do I set I_MPI_PMI_LIBRARY, which I thought I needed to -- how else does IMPI find the Slurm PMI?  This might be why --mpi=none is failing, but for the moment, I can't set the variable because I can't find libpmi[1,2,x].so.
2. I would think that since none is the default, it should work.  Under what conditions would none fail, but pmi2 work?  Is it because IMPI supports pmi2?
3. If I do need to set I_MPI_PMI_LIBRARY, why does pmi2 still work without setting I_MPI_PMI_LIBRARY?  Or do I not need to set it when using IMPI?
4. I'm still trying to understand a bit more of the correlation between libpmi.so and mpi_*.so.  libpmi.so is the Slurm PMI library, correct?  And mpi_* are the Slurm plug-in libraries (e.g. mpi_none, mpi_pmi2, etc.).  How do these libraries fit together?
 
Thanks,
Raymond
0 Kudos
0 Replies
Reply