Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Domingos_R_
Beginner
26 Views

SLURM PMI with Intel MPI 4.0.0.028

Hi,
We have installed in our cluster both the resource manager SLURM v.2.0.5 and Intel MPI 4.0.0.28 versions.
I was experimenting withSLURM's implementation of the PMI interface but so far I am getting strange results.
Here is the situation.
The code I am testing is quite simple:
#------------------------------------------------------simplest.f90-------------------------------------------------------------------------------------
program parsec_mpi
implicit none
include 'mpif.h'
character(len=4) :: idstring
character(len=80) :: name
integer :: mpinfo, iam, procs_num, namelen
! Initialise MPI, get size and my id, create communicator
call MPI_INIT(mpinfo)

call MPI_COMM_SIZE(MPI_COMM_WORLD, procs_num, mpinfo)
call MPI_COMM_RANK(MPI_COMM_WORLD, iam, mpinfo)
call MPI_GET_PROCESSOR_NAME(name, namelen, mpinfo)

print *, ' Process ', iam,'/', procs_num, ' (', trim(name), ') says "Hello, world!"'
write(idstring,'(I4.4)') iam
open(16,file='out.'//idstring,form='formatted',status='unknown')
write(16,*) 'Processor No ', iam,' has started'
write(16,*) 'Number of processors: ', procs_num


write(16,*)
write(16,*) 'Closing file on PE #', iam
write(16,*)
close(16)

call MPI_FINALIZE(mpinfo)

end program parsec_mpi
#----------------------------------------------------------------------------------------------------------------------------------------------
I compile it,/opt/intel/impi/4.0.0.028/bin64/mpif90 -o simplest.x simplest.f90,
and submit the following batch job:
#-----------------------------------simplest.srm-------------------------------------
#!/bin/bash
#
# Copyright (C) 2011 Domingos Rodrigues
#
# Created: Sun Oct 2 22:44:13 2011 (BRT)
#
# $Id$
#
# 2 nodes with 8-cores each (Infiniband)
#

#SBATCH -o teste-%N-%j.out
#SBATCH -J teste
#SBATCH --ntasks=16
#SBATCH --nodes=2
#SBATCH --cpus-per-task=1

source /opt/intel/impi/4.0.0.028/intel64/bin/mpivars.sh
export I_MPI_PMI_LIBRARY=/usr/lib64/libpmi.so
export I_MPI_FABRICS=shm:dapl

srun ./simplest.x
#-----------------------------------simplest.srm-------------------------------------
Well, the job runs sucessfully but the output is the most unexpected:
Process 0 / 1 (veredas3) says "Hello, world!"
Process 0 / 1 (veredas3) says "Hello, world!"
Process 0 / 1 (veredas4) says "Hello, world!"
Process 0 / 1 (veredas4) says "Hello, world!"
Process 0 / 1 (veredas3) says "Hello, world!"
Process 0 / 1 (veredas4) says "Hello, world!"
Process 0 / 1 (veredas4) says "Hello, world!"
Process 0 / 1 (veredas3) says "Hello, world!"
Process 0 / 1 (veredas4) says "Hello, world!"
Process 0 / 1 (veredas4) says "Hello, world!"
Process 0 / 1 (veredas3) says "Hello, world!"
Process 0 / 1 (veredas4) says "Hello, world!"
Process 0 / 1 (veredas3) says "Hello, world!"
Process 0 / 1 (veredas3) says "Hello, world!"
Process 0 / 1 (veredas4) says "Hello, world!"
Process 0 / 1 (veredas3) says "Hello, world!"
It seems that the processes are not getting the right rank (# processes is wrong).
The submission goes well if I go through the old traditional way with the sequence
of stepsmpdboot+mpiexc+mpdallexit.
Could someone shed some light on this? Any help would be most appreciated!
Best regards,
Domingos
0 Kudos
1 Reply
Dmitry_K_Intel2
Employee
26 Views

Hi Domingos,

It seems to me that PMI virtualization should not work with 4.0.0.028 library. It's quite old. Could you please download 4.0 Update 2 and give it a try.
Also please read this article.

Regards!
Dmitry