Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Ricardo_Frantz
Beginner
194 Views

Help with Assertion failed in file ../../dapl_conn_rc.c

I am trying to run a Fortran MPI-based code on a cluster. The code works fine in local execution (i7) or in a single node (dual Xeon) handling even aggressive optimization options like -fast. In our cluster, I can only make it work with gcc+openmpi. Intel does not work.

e.g. with gcc 4.6.3 and mpif90 -O3 -funroll-loops -ftree-vectorize -cpp -march=native -g -fbacktrace -ffast-math and mpirun -machinefile nodefile -np 96 ./incompact3d. Works, but is very slow.

Some info about the installation:
$mpirun --version
Intel(R) MPI Library for Linux* OS, Version 4.1.0 Build 20120831
Copyright (C) 2003-2012, Intel Corporation. All rights reserved.

$mpiifort --version
ifort (IFORT) 13.0.1 20121010
Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.

$ofed_info > ofed_info (attached file)

$ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 192379
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 32768
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 192379
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

$ mpirun -genvall -genv I_MPI_DEBUG 5 -genv I_MPI_HYDRA_DEBUG 1 -genv I_MPI_FABRICS=shm:dapl -machinefile ./nodes -n 96 ./incompact3d > log (attached file)

unexpected disconnect completion event from [27:cerrado02n]
Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
internal ABORT - process 28

Any suggestion to make it work?
Thanks in advance.

0 Kudos
1 Reply
Michael_S
Employee
194 Views

Hi Ricardo,

It seems that the Intel MPI version you are using (4.1.0) is quite outdated. Please update to the most recent version (5.1.3) and check if the problem still exists.

Also, since the crash is coming from the DAPL layer, please update the DAPL library running underneath Intel MPI. The most recent version (2.1.9) can be found on the OpenFabrics distribution site (http://downloads.openfabrics.org/downloads/dapl/).

Best regards,

Michael

Reply