- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am currently working on testing our new inifiniband cards and am running into some problems. The cluster is small, 8 compute nodes, 16 cores each, with the master having 4 cores. The nodes are connected via ethernet(for PXE booting) and inifiniband connected via Qlogic 12300 switch. The IB cards are Qlogic 7342, and the job scheduler is SLURM. The problem is that even though my memlock is unlimited, it is complaining that it is too small. Below is a truncation of the all the warnings since they are all the same. Any ideas on what could be causing the issue? Thanks in advance. For what its worth, dapl does not work either.
[-1] MPI startup(): Imported environment partly inaccesible. Map=0 Info=9717a0
[17] MPI startup(): RLIMIT_MEMLOCK too small
[1] MPI startup(): fabric ofa failed: will try use tcp fabric
[1] MPI startup(): tcp data transfer mode
[jmgonza6@cluster] $> ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 62699
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[jmgonza6@n0001]$> ibv_devinfo
hca_id: qib0
transport: InfiniBand (0)
fw_ver: 0.0.0
node_guid: 0011:7500:0070:b05e
sys_image_guid: 0011:7500:0070:b05e
vendor_id: 0x1175
vendor_part_id: 29474
hw_ver: 0x2
board_id: InfiniPath_QLE7340
phys_port_cnt: 1
port: 1
state: PORT_ACTIVE (4)
max_mtu: 4096 (5)
active_mtu: 2048 (4)
sm_lid: 1
port_lid: 3
port_lmc: 0x00
link_layer: InfiniBand
[jmgonza6@cluster]$> cat slurm.sub
#!/bin/bash
#SBATCH -J LAMMPS-ETH32
#SBATCH -n 32
#SBATCH -t 20:00:00
#SBATCH --partition=intel_rack
#SBATCH --network=IB
#SBATCH --mail-type=ALL
#SBATCH --mail-user=jmgonza6@mail.usf.edu
. /etc/profile.d/modules.sh
module purge
module load compilers/intel/2013sp1_cluster_xe
module load mpi/impi/4.1.3.048
export I_MPI_FABRICS=ofa
export I_MPI_FALLBACK=enable
#export I_MPI_PROCESS_MANAGER=mpd
export I_MPI_DEBUG=2
srun -n 32 ./hello.x 1> output 2> errs
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Check your SLURM configuration. SLURM can set other limits that are only applied within jobs. I'd recommend taking a look at their FAQ, specifically https://computing.llnl.gov/linux/slurm/faq.html#rlimit and https://computing.llnl.gov/linux/slurm/faq.html#memlock.
Please note that these sites are not controlled by Intel, and we are not responsible for the content on them.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page