Community
cancel
Showing results for 
Search instead for 
Did you mean: 
joseph_g_1
Beginner
279 Views

RLIMIT_MEMLOCK too small, although locked men = unlimited

 Hello,

 

I am currently working on testing our new inifiniband cards and am running into some problems.  The cluster is small, 8 compute nodes, 16 cores each, with the master having 4 cores.  The nodes are connected via ethernet(for PXE booting) and inifiniband connected via Qlogic 12300 switch.  The IB cards are Qlogic 7342, and the job scheduler is SLURM.  The problem is that even though my memlock is unlimited, it is complaining that it is too small. Below is a truncation of the all the warnings since they are all the same.  Any ideas on what could be causing the issue?  Thanks in advance.  For what its worth, dapl does not work either.

[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=9717a0

[17] MPI startup(): RLIMIT_MEMLOCK too small

[1] MPI startup(): fabric ofa failed: will try use tcp fabric

[1] MPI startup(): tcp data transfer mode

 

[jmgonza6@cluster] $> ulimit -a

core file size          (blocks, -c) 0

data seg size           (kbytes, -d) unlimited

scheduling priority             (-e) 0

file size               (blocks, -f) unlimited

pending signals                 (-i) 62699

max locked memory       (kbytes, -l) unlimited

max memory size         (kbytes, -m) unlimited

open files                      (-n) 1024

pipe size            (512 bytes, -p) 8

POSIX message queues     (bytes, -q) 819200

real-time priority              (-r) 0

stack size              (kbytes, -s) unlimited

cpu time               (seconds, -t) unlimited

max user processes              (-u) 1024

virtual memory          (kbytes, -v) unlimited

file locks                      (-x) unlimited

 

[jmgonza6@n0001]$> ibv_devinfo 

hca_id:    qib0

    transport:            InfiniBand (0)

    fw_ver:                0.0.0

    node_guid:            0011:7500:0070:b05e

    sys_image_guid:            0011:7500:0070:b05e

    vendor_id:            0x1175

    vendor_part_id:            29474

    hw_ver:                0x2

    board_id:            InfiniPath_QLE7340

    phys_port_cnt:            1

        port:    1

            state:            PORT_ACTIVE (4)

            max_mtu:        4096 (5)

            active_mtu:        2048 (4)

            sm_lid:            1

            port_lid:        3

            port_lmc:        0x00

            link_layer:        InfiniBand

 

[jmgonza6@cluster]$> cat slurm.sub       

#!/bin/bash

#SBATCH -J LAMMPS-ETH32

#SBATCH -n 32

#SBATCH -t 20:00:00

#SBATCH --partition=intel_rack

#SBATCH --network=IB

#SBATCH --mail-type=ALL

#SBATCH --mail-user=jmgonza6@mail.usf.edu

 

. /etc/profile.d/modules.sh

module purge

module load compilers/intel/2013sp1_cluster_xe

module load mpi/impi/4.1.3.048

 

export I_MPI_FABRICS=ofa

export I_MPI_FALLBACK=enable

#export I_MPI_PROCESS_MANAGER=mpd

export I_MPI_DEBUG=2

 

srun -n 32 ./hello.x 1> output 2> errs

 

 

0 Kudos
1 Reply
James_T_Intel
Moderator
279 Views

Check your SLURM configuration.  SLURM can set other limits that are only applied within jobs.  I'd recommend taking a look at their FAQ, specifically https://computing.llnl.gov/linux/slurm/faq.html#rlimit and https://computing.llnl.gov/linux/slurm/faq.html#memlock.

Please note that these sites are not controlled by Intel, and we are not responsible for the content on them.

Reply