Software Archive
Read-only legacy content
17061 Discussions

ELAN memory exception

prsncat
Beginner
513 Views
I am using the NCAR ES40 cluster "prospect" -- a 9 node cluster with
4 EV6 Alpha chips per node. I am running an f90 GCM (about 50000 lines)
under MPI with 12 tasks. Our code runs for about 2 hours wallclock time
then stops with the following message:

0 ELAN_EXCEPTION @ 0: 5 (Memory exhausted)
0 newTxDesc: Elan memory exhausted: port 3ff20009880
0 sh: 3230925 Killed
0 prun: /sbin/sh (pid 3230921) killed by signal 137
0 prun: no core file
prun.orig: starting 12 processes on 12 cpus default memlimit timelimit 21600 secs
logout

This is from the master task 0. I have set setenv MP_STACK_SIZE 10000000
before execution. Am I blowing the stack, or is the heap corrupted?
How can I get more information about where the memory demand is
accumulating? Unfortunately, we do not have the Enterprise tool on
our system. Thanks for any suggestions,

--Ben
foster@ucar.edu
0 Kudos
1 Reply
Steven_L_Intel1
Employee
513 Views
I'm not personally familiar with all this stuff, but I asked around, and was told that you're using an MPI library, which is not OpenMP, and thus MP_STACK_SIZE is irrelevant. It sounds as if the MPI library either has a memory leak or your application is causing it to use too much memory. This isn't a Fortran product issue, so I'm not going to be able to help further. I suggest you talk to some people familiar with MPI. Do you have a Compaq support or sales support person assigned to your project?

Steve
0 Kudos
Reply