RLIMIT_MEMLOCK too small, although locked men = unlimited

joseph_g_1 · ‎06-27-2015

Hello,

I am currently working on testing our new inifiniband cards and am running into some problems. The cluster is small, 8 compute nodes, 16 cores each, with the master having 4 cores. The nodes are connected via ethernet(for PXE booting) and inifiniband connected via Qlogic 12300 switch. The IB cards are Qlogic 7342, and the job scheduler is SLURM. The problem is that even though my memlock is unlimited, it is complaining that it is too small. Below is a truncation of the all the warnings since they are all the same. Any ideas on what could be causing the issue? Thanks in advance. For what its worth, dapl does not work either.

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=9717a0

[17] MPI startup(): RLIMIT_MEMLOCK too small

[1] MPI startup(): fabric ofa failed: will try use tcp fabric

[1] MPI startup(): tcp data transfer mode

[jmgonza6@cluster] $> ulimit -a

core file size (blocks, -c) 0

data seg size (kbytes, -d) unlimited

scheduling priority (-e) 0

file size (blocks, -f) unlimited

pending signals (-i) 62699

max locked memory (kbytes, -l) unlimited

max memory size (kbytes, -m) unlimited

open files (-n) 1024

pipe size (512 bytes, -p) 8

POSIX message queues (bytes, -q) 819200

real-time priority (-r) 0

stack size (kbytes, -s) unlimited

cpu time (seconds, -t) unlimited

max user processes (-u) 1024

virtual memory (kbytes, -v) unlimited

file locks (-x) unlimited

[jmgonza6@n0001]$> ibv_devinfo

hca_id: qib0

transport: InfiniBand (0)

fw_ver: 0.0.0

node_guid: 0011:7500:0070:b05e

sys_image_guid: 0011:7500:0070:b05e

vendor_id: 0x1175

vendor_part_id: 29474

hw_ver: 0x2

board_id: InfiniPath_QLE7340

phys_port_cnt: 1

port: 1

state: PORT_ACTIVE (4)

max_mtu: 4096 (5)

active_mtu: 2048 (4)

sm_lid: 1

port_lid: 3

port_lmc: 0x00

link_layer: InfiniBand

[jmgonza6@cluster]$> cat slurm.sub

#!/bin/bash

#SBATCH -J LAMMPS-ETH32

#SBATCH -n 32

#SBATCH -t 20:00:00

#SBATCH --partition=intel_rack

#SBATCH --network=IB

#SBATCH --mail-type=ALL

#SBATCH --mail-user=jmgonza6@mail.usf.edu

. /etc/profile.d/modules.sh

module purge

module load compilers/intel/2013sp1_cluster_xe

module load mpi/impi/4.1.3.048

export I_MPI_FABRICS=ofa

export I_MPI_FALLBACK=enable

#export I_MPI_PROCESS_MANAGER=mpd

export I_MPI_DEBUG=2

srun -n 32 ./hello.x 1> output 2> errs

James_T_Intel · ‎07-07-2015

Check your SLURM configuration. SLURM can set other limits that are only applied within jobs. I'd recommend taking a look at their FAQ, specifically https://computing.llnl.gov/linux/slurm/faq.html#rlimit and https://computing.llnl.gov/linux/slurm/faq.html#memlock.

Please note that these sites are not controlled by Intel, and we are not responsible for the content on them.