- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to run a Fortran MPI-based code on a cluster. The code works fine in local execution (i7) or in a single node (dual Xeon) handling even aggressive optimization options like -fast. In our cluster, I can only make it work with gcc+openmpi. Intel does not work.
e.g. with gcc 4.6.3 and mpif90 -O3 -funroll-loops -ftree-vectorize -cpp -march=native -g -fbacktrace -ffast-math and mpirun -machinefile nodefile -np 96 ./incompact3d. Works, but is very slow.
Some info about the installation:
$mpirun --version
Intel(R) MPI Library for Linux* OS, Version 4.1.0 Build 20120831
Copyright (C) 2003-2012, Intel Corporation. All rights reserved.
$mpiifort --version
ifort (IFORT) 13.0.1 20121010
Copyright (C) 1985-2012 Intel Corporation. All rights reserved.
$ofed_info > ofed_info (attached file)
$ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 192379
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 32768
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 192379
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
$ mpirun -genvall -genv I_MPI_DEBUG 5 -genv I_MPI_HYDRA_DEBUG 1 -genv I_MPI_FABRICS=shm:dapl -machinefile ./nodes -n 96 ./incompact3d > log (attached file)
unexpected disconnect completion event from [27:cerrado02n]
Assertion failed in file ../../dapl_conn_rc.c at line 1128: 0
internal ABORT - process 28
Any suggestion to make it work?
Thanks in advance.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Ricardo,
It seems that the Intel MPI version you are using (4.1.0) is quite outdated. Please update to the most recent version (5.1.3) and check if the problem still exists.
Also, since the crash is coming from the DAPL layer, please update the DAPL library running underneath Intel MPI. The most recent version (2.1.9) can be found on the OpenFabrics distribution site (http://downloads.openfabrics.org/downloads/dapl/).
Best regards,
Michael

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page