Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

Intel MPI issue with the usage of Slurm

Tingyang_X_
Beginner
1,320 Views

To whom it may concern,

Hello. We are using Slurm to manage our Cluster. However, we met a new issue of Intel MPI with Slurm. When one node reboots, the Intel MPI will fail with that node but manaully restart of slurm daemon will fix it. I also tried to add "service slurm restart" in /etc/rc.local which runs in the end of booting but the issue is still there.

Moreover, I submitted this issue to the slurm-dev but they believed that it was due to Infiniband+IMPI configuration. They suggested me to configure dat.conf and set up some Intel MPI variables. However, I don't know how to set them.

Here is an example:

$ salloc -N1 -n12 -w cn117 #cn117 is the node just rebooted
salloc: Granted job allocation 1201
$ module list
Currently Loaded Modulefiles:
  1) modules                    2) null                       3) intelics/2013.1.039
$ export I_MPI_PMI_LIBRARY=/gpfs/slurm/lib/libpmi.so
$ export I_MPI_FABRICS=shm:ofa
$ srun ./hello
[3] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[4] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[5] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[6] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[7] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[8] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[10] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[11] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[9] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[1] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[2] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
srun: error: cn117: tasks 0-11: Exited with exit code 254
srun: Terminating job step 1201.0

After restarting the slurm daemon:

$ ssh root@cn117
cn117$  service slurm restart
stopping slurmd:                                           [  OK  ]
slurmd is stopped
starting slurmd:                                           [  OK  ]
$ exit
$ salloc -N1 -n12 -w cn117
salloc: Granted job allocation 1203
$ export I_MPI_PMI_LIBRARY=/gpfs/slurm/lib/libpmi.so
$ export I_MPI_FABRICS=shm:ofa
$ srun ./hello
This is Process  9 out of 12 running on host cn117
This is Process  3 out of 12 running on host cn117
This is Process  2 out of 12 running on host cn117
This is Process  7 out of 12 running on host cn117
This is Process  6 out of 12 running on host cn117
This is Process  0 out of 12 running on host cn117
This is Process  5 out of 12 running on host cn117
This is Process  1 out of 12 running on host cn117
This is Process  4 out of 12 running on host cn117
This is Process 10 out of 12 running on host cn117
This is Process  8 out of 12 running on host cn117
This is Process 11 out of 12 running on host cn117

Here is the default dat.conf we have:

# DAT v2.0, v1.2 configuration file
#
# Each entry should have the following fields:
#
# <ia_name> <api_version> <threadsafety> <default> <lib_path> \
#           <provider_version> <ia_params> <platform_params>
#
# For uDAPL cma provder, <ia_params> is one of the following:
#       network address, network hostname, or netdev name and 0 for port
#
# For uDAPL scm provider, <ia_params> is device name and port
# For uDAPL ucm provider, <ia_params> is device name and port
# For uDAPL iWARP provider, <ia_params> is netdev device name and 0
# For uDAPL iWARP provider, <ia_params> is netdev device name and 0
# For uDAPL RoCE provider, <ia_params> is device name and 0
#
ofa-v2-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-ib0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib0 0" ""
ofa-v2-ib1 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "ib1 0" ""
ofa-v2-mthca0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 1" ""
ofa-v2-mthca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mthca0 2" ""
ofa-v2-ipath0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 1" ""
ofa-v2-ipath0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ipath0 2" ""
ofa-v2-ehca0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "ehca0 1" ""
ofa-v2-iwarp u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""
ofa-v2-mlx4_0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mlx4_0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-mthca0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mthca0 1" ""
ofa-v2-mthca0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mthca0 2" ""
ofa-v2-cma-roe-eth2 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth2 0" ""
ofa-v2-cma-roe-eth3 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "eth3 0" ""
ofa-v2-scm-roe-mlx4_0-1 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-scm-roe-mlx4_0-2 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-mcm-1 u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mcm-2 u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-scif0 u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "scif0 1" ""
ofa-v2-scif0-u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "scif0 1" ""
ofa-v2-mic0 u2.0 nonthreadsafe default libdaplofa.so.2 dapl.2.0 "mic0:ib 1" ""
ofa-v2-mlx4_0-1s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mlx4_0-2s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-mlx4_1-1s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_1 1" ""
ofa-v2-mlx4_1-2s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx4_1 2" ""
ofa-v2-mlx4_1-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_1 1" ""
ofa-v2-mlx4_1-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx4_1 2" ""
ofa-v2-mlx4_0-1m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx4_0 1" ""
ofa-v2-mlx4_0-2m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx4_0 2" ""
ofa-v2-mlx4_1-1m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx4_1 1" ""
ofa-v2-mlx4_1-2m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx4_1 2" ""
ofa-v2-mlx5_0-1s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx5_0 1" ""
ofa-v2-mlx5_0-2s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx5_0 2" ""
ofa-v2-mlx5_1-1s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx5_1 1" ""
ofa-v2-mlx5_1-2s u2.0 nonthreadsafe default libdaploscm.so.2 dapl.2.0 "mlx5_1 2" ""
ofa-v2-mlx5_0-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx5_0 1" ""
ofa-v2-mlx5_0-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx5_0 2" ""
ofa-v2-mlx5_1-1u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx5_1 1" ""
ofa-v2-mlx5_1-2u u2.0 nonthreadsafe default libdaploucm.so.2 dapl.2.0 "mlx5_1 2" ""
ofa-v2-mlx5_0-1m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx5_0 1" ""
ofa-v2-mlx5_0-2m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx5_0 2" ""
ofa-v2-mlx5_1-1m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx5_1 1" ""
ofa-v2-mlx5_1-2m u2.0 nonthreadsafe default libdaplomcm.so.2 dapl.2.0 "mlx5_1 2" ""

Some system information here:

$ slurmd -V
slurm 14.03.0
 
$ mpirun –V
Intel(R) MPI Library for Linux* OS, Version 4.1 Update 1 Build 20130522
Copyright (C) 2003-2013, Intel Corporation. All rights reserved.
 
cn117$ ofed_info|head -n1
MLNX_OFED_LINUX-2.2-1.0.1 (OFED-2.2-1.0.0):
 
cn117$ ibv_devinfo
hca_id: mlx4_0
transport:   InfiniBand (0)
fw_ver:    2.11.550
node_guid:  
sys_image_guid:   ##########
vendor_id:   ##########
vendor_part_id:   ########
hw_ver:    0x0
board_id:   ########
phys_port_cnt:   2
  port: 1
   state:   PORT_ACTIVE (4)
   max_mtu:  4096 (5)
   active_mtu:  4096 (5)
   sm_lid:   1
   port_lid:  131
   port_lmc:  0x00
   link_layer:  InfiniBand
 
  port: 2
   state:   PORT_DOWN (1)
   max_mtu:  4096 (5)
   active_mtu:  4096 (5)
   sm_lid:   0
   port_lid:  0
   port_lmc:  0x00
   link_layer:  InfiniBand

 
cn117$ cat /etc/redhat-release 
Red Hat Enterprise Linux Workstation release 6.5 (Santiago)
cn117$ uname –r
2.6.32-431.23.3.el6.x86_64

I wonder if anyone faced similar issue before and could help us to figure out a solution.

Thanks,

Tingyang Xu

0 Kudos
2 Replies
James_T_Intel
Moderator
1,320 Views

The error you are getting indicates that the OFA fabric is unavailable on the node before SLURM* is rebooted.  I would check the SLURM* restart process and see if it is restarting the OFED* driver.  Also, see if you can run a job with I_MPI_FABRICS=shm:tcp before rebooting SLURM*.

When using OFA, dat.conf is not used, so that is not where you need to be looking.

0 Kudos
Tingyang_X_
Beginner
1,320 Views

I do not think slurm restarted ofa. I do put the restart in the rc.local. But it still has the issue. Here is the information from boot.log

        Welcome to Red Hat Enterprise Linux Workstation
Starting udev:                                             [  OK  ]
Setting hostname cn133:                                    [  OK  ]
Setting up Logical Volume Management:   3 logical volume(s) in volume group "vg_cn133" now active
                                                           [  OK  ]
Checking filesystems
/dev/mapper/vg_cn133-lv_root: clean, 83124/3276800 files, 897418/13107200 blocks
/dev/sda1: clean, 45/128016 files, 79861/512000 blocks
/dev/mapper/vg_cn133-lv_home: clean, 11/14712832 files, 971293/58820608 blocks
                                                           [  OK  ]
Remounting root filesystem in read-write mode:             [  OK  ]
Mounting local filesystems:                                [  OK  ]
Enabling local filesystem quotas:                          [  OK  ]
Enabling /etc/fstab swaps:                                 [  OK  ]
Entering non-interactive startup
Calling the system activity data collector (sadc)... 
Starting monitoring for VG vg_cn133:   3 logical volume(s) in volume group "vg_cn133" monitored
                                                           [  OK  ]
Loading HCA driver and Access Layer:                       [  OK  ]
Setting up InfiniBand network interfaces:
Bringing up interface ib0:                                 [  OK  ]
Setting up service network . . .                           [  done  ]
Bringing up loopback interface:                            [  OK  ]
Bringing up interface em1:  
Determining IP information for em1... done.
                                                           [  OK  ]
Bringing up interface ib0:  RTNETLINK answers: File exists
                                                           [  OK  ]
Starting MUNGE: munged                                     [  OK  ]
Starting postfix:                                          [  OK  ]
Starting abrt daemon:                                      [  OK  ]
Loading BLCR: FATAL: Module blcr_imports not found.
FATAL: Module blcr not found.
                                                           [  OK  ]
Starting crond:                                            [  OK  ]
starting slurmd:                                           [  OK  ]
Starting atd:                                              [  OK  ]
Starting Red Hat Network Daemon:                           [  OK  ]
Starting rhsmcertd...                                      [  OK  ]
Starting certmonger:                                       [  OK  ]
Stopping slurmd:                                           [  OK  ]
slurmd is stopped
Starting slurmd:                                           [  OK  ]

The settings I_MPI_FABRICS=shm:tcp works but I hope to use infiniband instead.

$ srun ./hello
[18] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[19] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[0] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[1] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[3] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[4] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[5] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[2] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[6] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[7] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[8] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[9] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[10] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[11] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[12] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[13] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[14] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[15] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[16] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
[17] MPI startup(): ofa fabric is not available and fallback fabric is not enabled
srun: error: cn133: tasks 0-19: Exited with exit code 254
srun: Terminating job step 1675.0

$ export I_MPI_FABRICS=shm:tcp
$ srun ./hello
This is Process 11 out of 20 running on host cn133
This is Process  6 out of 20 running on host cn133
This is Process 12 out of 20 running on host cn133
This is Process  0 out of 20 running on host cn133
This is Process  3 out of 20 running on host cn133
This is Process  2 out of 20 running on host cn133
This is Process 16 out of 20 running on host cn133
This is Process 18 out of 20 running on host cn133
This is Process  5 out of 20 running on host cn133
This is Process 13 out of 20 running on host cn133
This is Process 17 out of 20 running on host cn133
This is Process  1 out of 20 running on host cn133
This is Process  8 out of 20 running on host cn133
This is Process 19 out of 20 running on host cn133
This is Process 10 out of 20 running on host cn133
This is Process 14 out of 20 running on host cn133
This is Process  9 out of 20 running on host cn133
This is Process 15 out of 20 running on host cn133
This is Process  7 out of 20 running on host cn133
This is Process  4 out of 20 running on host cn133

I apreciate for you suggestion.

 

Best,

Tingyang Xu

0 Kudos
Reply