Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29236 Discussions

problems with stack management for large runs on beowulf cluster with intel fortran compiler 8.1.024

antonio
Beginner
621 Views

Hi,

I am trying to compile a large code on Beowulf cluster using the intel fortran compiler 8.1.024 and I am experiencing a problem with stack size management since when I try to start the code , it immediately crashes with a segfault *in spite of* my system having 2 GB RAM.
The compiler options adopted in my makefile are the following

OPT = -I. -O3 -unroll16 -scalar_rep -nowarn -O3 -tpp7 -axN -xN
-ftz -prec_div -rcd -us -g -debug -check all

LDFLG = -L/opt/intel/lib -Vaxlib -lm -static -static-libcxa

vpp5exe: addmods / $(DATAOBJ) $(PUBLICUTILS) $(UTILOBJ) $(EXTOBJ)
$(SRCOBJ)
echo linking parallel program
( $(LDPAR) $(LDFLG) -o lmparbin $(DATAOBJ) $(UTILOBJ) $(EXTOBJ)

$(PUBLICUTILS) $(SRCOBJ) $(LIB) )

From the intel release_notes I knew that one of know limitations of the fortran compiler is thatthe Intel Fortran Compilers 8.x allocate more temporaries on the stack than previous Intel Fortran compilers. If a program has inadequate stack space at runtime, it will terminate with a Segmentation fault or Signal 11.
I have used different compiler options, but I cannot find out why the executable cannot work: no debugging switches help. I cannot understand if it's a compiler or system problem.
I just know that at smaller size, the program works without problems.
For bigger size problems, I try to use ulimit -s -S unlimited but it does not help anymore.
I hope that someone can help me.
Thank you
Antonio
0 Kudos
1 Reply
Keith_R_
Novice
621 Views
You don't give many details of your parallel system, so it's
not possible to diagnose the cause.

However I will warn you about one serious pitfall which might
cause similar problems if you use the LAM or MPICH MPI systems.

If you set "ulimit -s unlimited" before issuing the mpirun
command, then this will change the stack limit only for the
shell on the local machine and NOT the other parallel nodes.
In other words, you may not have changed the stack limit at
all for the actual run processes.

You should put this command into the ".bashrc" (or equivalent
startup file) and ensure it is executed on all of the the other
nodes.

Keith Refson
0 Kudos
Reply