Software Archive
Read-only legacy content
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
17060 Discussions

ELAN memory exception

prsncat
Beginner
562 Views
I am using the NCAR ES40 cluster "prospect" -- a 9 node cluster with
4 EV6 Alpha chips per node. I am running an f90 GCM (about 50000 lines)
under MPI with 12 tasks. Our code runs for about 2 hours wallclock time
then stops with the following message:

0 ELAN_EXCEPTION @ 0: 5 (Memory exhausted)
0 newTxDesc: Elan memory exhausted: port 3ff20009880
0 sh: 3230925 Killed
0 prun: /sbin/sh (pid 3230921) killed by signal 137
0 prun: no core file
prun.orig: starting 12 processes on 12 cpus default memlimit timelimit 21600 secs
logout

This is from the master task 0. I have set setenv MP_STACK_SIZE 10000000
before execution. Am I blowing the stack, or is the heap corrupted?
How can I get more information about where the memory demand is
accumulating? Unfortunately, we do not have the Enterprise tool on
our system. Thanks for any suggestions,

--Ben
foster@ucar.edu
0 Kudos
1 Reply
Steven_L_Intel1
Employee
562 Views
I'm not personally familiar with all this stuff, but I asked around, and was told that you're using an MPI library, which is not OpenMP, and thus MP_STACK_SIZE is irrelevant. It sounds as if the MPI library either has a memory leak or your application is causing it to use too much memory. This isn't a Fortran product issue, so I'm not going to be able to help further. I suggest you talk to some people familiar with MPI. Do you have a Compaq support or sales support person assigned to your project?

Steve
0 Kudos
Reply