Intel® MPI Library
Get help with building, analyzing, optimizing, and scaling high-performance computing (HPC) applications.

error impi 4.1.2 dapl error

Alex10
Beginner
564 Views

 

Dear Experts,

 

After an update of our Cluster I started receiving dapl errors. I am compiling my fortran code with impi 4.1.2. The error occurs if I try running my code with complicated models having large memory demand. The dapl errors occur after completing about 75% of the entire job. This is a rather strange problem because I have 64GB of memory on my nodes. This should be more than enough to fit even the most complicated problems. Moreover these inputs were running without any objections before the update.

I presume a stack limitation problem, therefore, the first solution I attempted was "ulimit -l unlimited", which failed since I have no root privileges.  Subsequently on the intel forum I found a "c" program that sets explicitly the stack limit. I am calling it as an "external" subroutine at the very beginning of my fortran code.

#include  <sys/time.h>
#include  <sys/resource.h>
#include  <stdio.h>

void stacksize_()
{
int res;
struct rlimit rlim;

getrlimit(RLIMIT_STACK, &rlim);
printf("Before: cur=%d,hard=%d\n",(int)rlim.rlim_cur,(int)rlim.rlim_max);

rlim.rlim_cur=RLIM_INFINITY;
rlim.rlim_max=RLIM_INFINITY;
res=setrlimit(RLIMIT_STACK, &rlim);

getrlimit(RLIMIT_STACK, &rlim);
printf("After: res=%d,cur=%d,hard=%d\n",res,(int)rlim.rlim_cur,(int)rlim.rlim_max);

if ( setrlimit(RLIMIT_STACK, &rlim) == -1 ) {

perror("setrlimit error");
}

}

Unfortunately, this solution didn't solve my problem either. Observing the output of the above subroutine I conclude that the stack limits are not really changed, the output form one of the nodes is listed below: 

Before: cur=-1,hard=-1
After: res=0,cur=-1,hard=-1

The complete error log file is attached   357188

I don't have experience with this type of MPI errors. In my opinion the settings of our machine were done wrong after the update, unfortunately the administrators were not able to suggest any solution. 

357188

357188

0 Kudos
0 Replies
Reply