Community
cancel
Showing results for 
Search instead for 
Did you mean: 
John_D_6
New Contributor I
1,078 Views

INTERNAL ERROR with SLURM and PMI2

I was pleasantly surprised to read that PMI2 & SLURM is supported by Intel MPI in the 2017 release. I tested it, but it fails immediately on my setup.  I'm using intel parallel studio 2017 update 4 & SLURM 15.08.13. A simple MPI-program doesn't work:

[donners@int1 pmi2]$ cat mpi.f90
program test
  use mpi
  implicit none
 
  integer ierr,nprocs,rank

  call mpi_init(ierr)
  call mpi_comm_size(MPI_COMM_WORLD,nprocs,ierr)
  call mpi_comm_rank(mpi_comm_world,rank,ierr)
  if (rank .eq. 0) then
    print *,'Number of processes: ',nprocs
  endif
  print*,'I am rank ',rank
  call mpi_finalize(ierr)

end
[donners@int1 pmi2]$ mpiifort mpi.f90
[donners@int1 pmi2]$ ldd ./a.out
    linux-vdso.so.1 =>  (0x00007ffcc0364000)
    libmpifort.so.12 => /opt/intel/parallel_studio_xe_2017_update4/compilers_and_libraries/linux/mpi/intel64/lib/libmpifort.so.12 (0x00002ad7432a9000)
    libmpi.so.12 => /opt/intel/parallel_studio_xe_2017_update4/compilers_and_libraries/linux/mpi/intel64/lib/release_mt/libmpi.so.12 (0x00002ad743652000)
    libdl.so.2 => /lib64/libdl.so.2 (0x00002ad744397000)
    librt.so.1 => /lib64/librt.so.1 (0x00002ad74459c000)
    libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ad7447a4000)
    libm.so.6 => /lib64/libm.so.6 (0x00002ad7449c1000)
    libc.so.6 => /lib64/libc.so.6 (0x00002ad744c46000)
    libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002ad744fda000)
    /lib64/ld-linux-x86-64.so.2 (0x00002ad743086000)
[donners@int1 pmi2]$ I_MPI_PMI2=yes srun -n 1 --mpi=pmi2 ./a.out

INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPID_Init:2104
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(805): fail failed
MPID_Init(1716)......: channel initialization failed
MPID_Init(2104)......: fail failed
srun: error: tcn1467: task 0: Exited with exit code 15
srun: Terminating job step 3270641.0

[donners@int1 pmi2]$ srun --version
slurm 15.08.13-Bull.1.0

The same problem occurs on a system with SLURM 17.02.3 (at TACC). What might be the problem here?

With regards,

John

 

0 Kudos
3 Replies
James_S
Employee
1,078 Views

Hi John, could you please try to export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so to specify the full path to the libpmi.so library and then run it again? Thanks.

James_S
Employee
1,078 Views

Hi John, Intel MPI has the binary incompatibility with libpmi2.so in some supports when in PMI2 mode and further binary compatibility is needed. Sorry for the inconvenience. Thanks.

John_D_6
New Contributor I
1,078 Views

thank you for the tip. I ran the test again with I_MPI_PMI_LIBRARY set to the locations of libpmi.so and libpmi2.so:

$ I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so I_MPI_PMI2=yes srun -n 1 --mpi=pmi2 ./a.out
srun: job 3324955 queued and waiting for resources
srun: job 3324955 has been allocated resources
INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPID_Init:2104
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(805): fail failed
MPID_Init(1716)......: channel initialization failed
MPID_Init(2104)......: fail failed
srun: error: tcn866: task 0: Exited with exit code 15
srun: Terminating job step 3324955.0

$ I_MPI_PMI_LIBRARY=/usr/lib64/libpmi2.so I_MPI_PMI2=yes srun -n 1 --mpi=pmi2 ./a.out
srun: job 3324966 queued and waiting for resources
srun: job 3324966 has been allocated resources
 Number of processes:            1
 I am rank            0

so it works fine when I point to libpmi2.so (which seems pretty logical now).

The reason that I wanted to use PMI2 was its support for dynamically spawning MPI processes in a SLURM job. I used the following example to test dynamic spawning:

program mpispawn
  use mpi
  implicit none

  integer ierr,errcodes(1),intercomm,pcomm,mpisize,dumm,rank
  character(1000) cmd
  logical master

  call MPI_Init(ierr)
  call get_command_argument(0,cmd)
  print*,'cmd=',trim(cmd)
  call MPI_Comm_get_parent(pcomm,ierr)
  if (pcomm.eq.MPI_COMM_NULL) then
    print*,'I am the master. Clone myself!'
    master=.true.
    call MPI_Comm_spawn(cmd,MPI_ARGV_NULL,4,MPI_INFO_NULL,0,MPI_COMM_WORLD,pcomm,errcodes,ierr)
    call MPI_Comm_size(pcomm,mpisize,ierr)
    print*,'Processes in intercommunicator:',mpisize
    dumm=88
    call MPI_Bcast(dumm,1,MPI_INTEGER,MPI_ROOT,pcomm,ierr)
  else
    print*,'I am a clone. Use me'
    master=.false.
    call MPI_Bcast(dumm,1,MPI_INTEGER,0,pcomm,ierr)
  endif
  call MPI_Comm_rank(pcomm,rank,ierr)
  print*,'rank,master,dumm=',rank,master,dumm
  call sleep(10)
  call MPI_Barrier(pcomm,ierr)
  call MPI_Finalize(ierr)
end

I submitted this test program to SLURM with the following job script:

#!/bin/bash
#SBATCH -n 24
#SBATCH -p short
#SBATCH -c 1
#SBATCH --mem-per-cpu=1000
#SBATCH -t 5:00

# use PMI2
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi2.so

export PMI_DEBUG=1
export I_MPI_PMI2=yes
ldd  ./mpi.impi2017
srun -v --mpi=pmi2 -l -n 1 ./mpi.impi2017

The job output is:

        linux-vdso.so.1 =>  (0x00007ffeff7b9000)
        libmpifort.so.12 => /opt/intel/parallel_studio_xe_2017_update4/compilers_and_libraries/linux/mpi/intel64/lib/libmpifort.so.12 (0x00002ac75efff000)
        libmpi.so.12 => /opt/intel/parallel_studio_xe_2017_update4/compilers_and_libraries/linux/mpi/intel64/lib/debug_mt/libmpi.so.12 (0x00002ac75f3a8000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00000034bec00000)
        librt.so.1 => /lib64/librt.so.1 (0x00000034c0000000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00000034bf400000)
        libm.so.6 => /lib64/libm.so.6 (0x00000034bf800000)
        libc.so.6 => /lib64/libc.so.6 (0x00000034bf000000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000034c0400000)
        /lib64/ld-linux-x86-64.so.2 (0x00000034be800000)
srun: defined options for program `srun'
srun: --------------- ---------------------
srun: user           : `donners'
srun: uid            : 31046
srun: gid            : 31060
srun: cwd            : /nfs/home1/donners/Tests/mpispawn
srun: ntasks         : 1 (set)
srun: cpus_per_task  : 1
srun: nodes          : 1 (set)
srun: jobid          : 3325083 (default)
srun: partition      : default
srun: profile        : `NotSet'
srun: job name       : `job.intel'
srun: reservation    : `(null)'
srun: burst_buffer   : `(null)'
srun: wckey          : `(null)'
srun: cpu_freq_min   : 4294967294
srun: cpu_freq_max   : 4294967294
srun: cpu_freq_gov   : 4294967294
srun: switches       : -1
srun: wait-for-switches : -1
srun: distribution   : unknown
srun: cpu_bind       : default
srun: mem_bind       : default
srun: verbose        : 1
srun: slurmd_debug   : 0
srun: immediate      : false
srun: label output   : true
srun: unbuffered IO  : false
srun: overcommit     : false
srun: threads        : 60
srun: checkpoint_dir : /var/slurm/checkpoint
srun: wait           : 60
srun: nice           : -2
srun: account        : (null)
srun: comment        : (null)
srun: dependency     : (null)
srun: exclusive      : false
srun: bcast          : false
srun: qos            : (null)
srun: constraints    : mincpus-per-node=1 mem-per-cpu=1000M 
srun: geometry       : (null)
srun: reboot         : yes
srun: rotate         : no
srun: preserve_env   : false
srun: network        : (null)
srun: propagate      : NONE
srun: prolog         : /nfs/admin/scripts/admin/slurm_srunprolog
srun: epilog         : /nfs/admin/scripts/admin/slurm_srunepilog
srun: mail_type      : NONE
srun: mail_user      : (null)
srun: task_prolog    : (null)
srun: task_epilog    : (null)
srun: multi_prog     : no
srun: sockets-per-node  : -2
srun: cores-per-socket  : -2
srun: threads-per-core  : -2
srun: ntasks-per-node   : -2
srun: ntasks-per-socket : -2
srun: ntasks-per-core   : -2
srun: plane_size        : 4294967294
srun: core-spec         : NA
srun: power             : 
srun: sicp              : 0
srun: remote command    : `./mpi.impi2017'
srun: launching 3325083.0 on host tcn1514, 1 tasks: 0
srun: route default plugin loaded
srun: Node tcn1514, 1 tasks started
0:  cmd=/nfs/home1/donners/Tests/mpispawn/./mpi.impi2017
0:  I am the master. Clone myself!
0: Fatal error in MPI_Init: Other MPI error, error stack:
0: MPIR_Init_thread(805).......: fail failed
0: MPID_Init(1949).............: spawn process group was unable to obtain parent port name from the channel
0: MPIDI_CH3_GetParentPort(465):  PMI2 KVS_Get failed: PARENT_ROOT_PORT_NAME
1: Fatal error in MPI_Init: Other MPI error, error stack:
1: MPIR_Init_thread(805).......: fail failed
1: MPID_Init(1949).............: spawn process group was unable to obtain parent port name from the channel
1: MPIDI_CH3_GetParentPort(465):  PMI2 KVS_Get failed: PARENT_ROOT_PORT_NAME
2: Fatal error in MPI_Init: Other MPI error, error stack:
2: MPIR_Init_thread(805).......: fail failed
2: MPID_Init(1949).............: spawn process group was unable to obtain parent port name from the channel
2: MPIDI_CH3_GetParentPort(465):  PMI2 KVS_Get failed: PARENT_ROOT_PORT_NAME
3: Fatal error in MPI_Init: Other MPI error, error stack:
3: MPIR_Init_thread(805).......: fail failed
3: MPID_Init(1949).............: spawn process group was unable to obtain parent port name from the channel
3: MPIDI_CH3_GetParentPort(465):  PMI2 KVS_Get failed: PARENT_ROOT_PORT_NAME
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
0: slurmstepd: *** STEP 3325083.1 ON tcn1514 CANCELLED AT 2017-07-14T10:39:01 ***
srun: error: tcn1514: tasks 0-3: Exited with exit code 1
srun: Terminating job step 3325083.1

This error message looks similar to a bug report for MPICH: https://github.com/pmodels/mpich/issues/1814

Maybe the bug fix mentioned at the end of that report is also applicable to Intel MPI. I get the same error message with the latest Intel MPI 2018b1.

Cheers,
John

 

Reply