- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
we have a little cluster with 8 nodes (each one 12 cores). We have 2 blades. In one blade there are 4 nodes. All these nodes are connected with infiniband.
Intel MPI ist installed and configured with shm:ofa.
I'm starting the following test on all the cores of the cluster:
mpirun -np 96 IMB-MPI1
It generates "normal" results for all the sub-tests. But there is a problem with:
#----------------------------------------------------------------
# Benchmarking Alltoall
# #processes = 96
#----------------------------------------------------------------
it gives:
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.11 0.15 0.12
1 1000 42.13 42.15 42.14
2 1000 43.61 43.62 43.62
4 1000 52.55 52.57 52.56
8 1000 62.75 62.78 62.77
16 1000 68.49 68.52 68.50
32 1000 80.11 80.13 80.12
64 1000 111.07 111.10 111.09
128 1000 181.19 181.25 181.23
256 1000 368.36 368.52 368.44
512 1000 328.78 328.83 328.80
1024 1000 602.03 603.65 602.17
2048 1000 5873.23 5873.65 5873.45
4096 1000 6000.28 6000.59 6000.43
8192 1000 6965.62 6965.84 6965.75
16384 943 10429.38 10429.66 10429.52
32768 400 25244.62 25245.83 25245.13
65536 223 44969.48 44972.04 44970.70
131072 118 84991.07 84997.68 84994.67
262144 60 167439.02 167466.40 167451.96
524288 31 330707.68 330769.06 330739.70
1048576 16 658785.06 659147.81 658966.23
2097152 8 1314571.62 1315755.52 1315313.50
n08:3914: reg_mr Cannot allocate memory
n08:3914: reg_mr Cannot allocate memory
n08:3915: reg_mr Cannot allocate memory
...
I'm seeing these "reg_mr Cannot allocate memory" for all the nodes...
What is exactly this problem and how can I solve it ?
Thx a lot!
Best regards
we have a little cluster with 8 nodes (each one 12 cores). We have 2 blades. In one blade there are 4 nodes. All these nodes are connected with infiniband.
Intel MPI ist installed and configured with shm:ofa.
I'm starting the following test on all the cores of the cluster:
mpirun -np 96 IMB-MPI1
It generates "normal" results for all the sub-tests. But there is a problem with:
#----------------------------------------------------------------
# Benchmarking Alltoall
# #processes = 96
#----------------------------------------------------------------
it gives:
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.11 0.15 0.12
1 1000 42.13 42.15 42.14
2 1000 43.61 43.62 43.62
4 1000 52.55 52.57 52.56
8 1000 62.75 62.78 62.77
16 1000 68.49 68.52 68.50
32 1000 80.11 80.13 80.12
64 1000 111.07 111.10 111.09
128 1000 181.19 181.25 181.23
256 1000 368.36 368.52 368.44
512 1000 328.78 328.83 328.80
1024 1000 602.03 603.65 602.17
2048 1000 5873.23 5873.65 5873.45
4096 1000 6000.28 6000.59 6000.43
8192 1000 6965.62 6965.84 6965.75
16384 943 10429.38 10429.66 10429.52
32768 400 25244.62 25245.83 25245.13
65536 223 44969.48 44972.04 44970.70
131072 118 84991.07 84997.68 84994.67
262144 60 167439.02 167466.40 167451.96
524288 31 330707.68 330769.06 330739.70
1048576 16 658785.06 659147.81 658966.23
2097152 8 1314571.62 1315755.52 1315313.50
n08:3914: reg_mr Cannot allocate memory
n08:3914: reg_mr Cannot allocate memory
n08:3915: reg_mr Cannot allocate memory
...
I'm seeing these "reg_mr Cannot allocate memory" for all the nodes...
What is exactly this problem and how can I solve it ?
Thx a lot!
Best regards
Link Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Guillaume,
You are probably using Mellanox HCAs. This message usually means that there is not enough memory for buffers. It depends on how much memory you have on a node. Alltoall requires a lot of memory for internal buffers and you just need to limit max size of the messages for IMB.
You can also try the following trick: add the following line to the /etc/modprobe.conf:
options mlx4_core log_mtts_per_seg=5
It should reduce memory consumed by communication functions.
Regards!
Dmitry
You are probably using Mellanox HCAs. This message usually means that there is not enough memory for buffers. It depends on how much memory you have on a node. Alltoall requires a lot of memory for internal buffers and you just need to limit max size of the messages for IMB.
You can also try the following trick: add the following line to the /etc/modprobe.conf:
options mlx4_core log_mtts_per_seg=5
It should reduce memory consumed by communication functions.
Regards!
Dmitry
add the following line to /etc/modprobe.conf:
options mlx4_core log_mtts_per_seg=5- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
Thx for your useful answer. I will try your ideas! But where can I find how limit the size of the message for IMB. I had the idea, but I couldn't find how...I'm too stupid to google correctly...
Best regards!
Guillaume
Thx for your useful answer. I will try your ideas! But where can I find how limit the size of the message for IMB. I had the idea, but I couldn't find how...I'm too stupid to google correctly...
Best regards!
Guillaume
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You need to provide a file with the explicit list of message lengths to include. I think the default behavior is to include all of them if no file is provided.
$ ./IMB-MPI1 -h...- msglenthe argument after -msglen is a lengths_file, an ASCII file, containing any set of nonnegativemessage lengths, 1 per line...
For instance, Intel Cluster Checker use the following list of msglen values to get a quick but still representative sample of results.
$ cat IMB_msglen01244194304
Note that you usually get best latency with a zero payload, and the best bandwidth with a really big payload.
As usual a would recommend some experimentation to optimize those values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
Great! thx a lot!
Great! thx a lot!

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page