Community
cancel
Showing results for 
Search instead for 
Did you mean: 
seongyun_k_
Beginner
201 Views

mmap() + MPI one-sided communication fails when DAPL UD enabled

Hi!

I used a trick in order to read a page located in a remote machine's disk.
(using mmap() over the whole file in each machine and creating MPI_one_sided communication windows on it)

It works fine when DAPL UD disabled but it spits the following error messages if I enable DAPL UD by setting 'I_MPI_DAPL_UD=1'.

XXX001:UCM:1d1a:84d2ab40: 271380 us(271380 us):  DAPL ERR reg_mr Cannot allocate memory
[0:XXX001] rtc_register failed 196608 [0] error(0x30000):  unknown error
Assertion failed in file ../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_send_ud.c at line 1468: 0
internal ABORT - process 0
XXX002:UCM:31e2:27bacb40: 263683 us(263683 us):  DAPL ERR reg_mr Cannot allocate memory
[1:XXX002] rtc_register failed 196608 [1] error(0x30000):  unknown error
Assertion failed in file ../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_send_ud.c at line 1468: 0
 
Pleased refer to the attached file for the code I used.
 
and I ran above program with following flags enabled:
 
export I_MPI_FABRICS=dapl
export I_MPI_DAPL_UD=1
 
command: mpiexec.hydra  -genvall -machinefile ~/machines -n 2 -ppn 1 ${PWD}/test2
 
 
 
Here are my general questions:
 
(1) When the window over mmaped region is created, does the ib driver try to pin the whole memory region to prevent page faults?
 
(2) Is the behavior when ib driver tries to register the memory region different depending on whether DAPL UD enabled/disabled?
 
 
 
 
Experimental Environment:
 
Hardware Spec:
OS : CentOS 6.4 Final
CPU : 2 * Intel® Xeon® CPU E5-2450 @ (2.10GHz, 8 physical cores)
RAM : 32GB per each
Ethernet: InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] 
Mellanox Infiniband driver: MLNX_OFED_LINUX-3.1-1.1.0.1 (OFED-3.1-1.1.0): 3.19.0
 
thanks,
0 Kudos
3 Replies
Artem_R_Intel1
Employee
201 Views

Hi,

Could you please provide your system limits ('ulimit -a') for root/user?

Pay attention to 'max locked memory' limit - if not yet done try to set it to 'unlimited' (root's permissions and host reboot may be required).

lee_k_
Beginner
201 Views

Hi,

I am a member of seongyun k's team.

Our system limits are:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127374
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Are there any problems in our system?

lee_k_
Beginner
201 Views

Artem R. (Intel) wrote:

Hi,

Could you please provide your system limits ('ulimit -a') for root/user?

Pay attention to 'max locked memory' limit - if not yet done try to set it to 'unlimited' (root's permissions and host reboot may be required).

 

Hi,

I am a member of seongyun k's team.

Our system limits are:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 127374
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Are there any problems in our system?

Reply