I am working with a numerical model application that somewhere in past months started to segfault on ifort on Linux. The application normally runs in MPI and links to some libraries (hdf5,netcdf,parmetis), although the behavior I'm writing about occurs with one processor and doesn't require mpiexec to reproduce.
When I use -traceback, the segmentation fault is attributed to more or less the samesubroutine every time. The line number varies, though, with the options and system that I am on (our own cluster and SDSC comet for instance). On my system I am working with ifort 14.0.1, a fairly close match to the one on Comet. I always set ulimit -s unlimited and my base compile options for getting a trace are: -O2 -debug extended -traceback. Many of the arrays are allocated dynamically on the heap manually.
As to what I have tried ...
-check bounds: does not produce a warning but eliminates the segfault crash on our cluster and Comet (same for uninit)
-mcmodel medium: eliminates the segfault on the head node of our cluster but not on comet. I did this with just my code, not the libraries -- that might not be kosher?
random print statements: often eliminate the segfault
-heap-arrays: no effect
-O3: eliminates the segfault, on our cluster
None of the diagnostics I've tried has produced a complaint I could really figure out. Can anyone think of a way to get more info? Constructing a minimal example is onerous -- it is a big production code and everytime I change one line it seems to affect reproducibility. I've tried Intel Inspector XE 2013 but it just hangs at startup, which I suppose is material for a different post. Is valgrind appropriate? We could upgrade of course, but there is a lot of startup cost to that and a lot of the work is destined for Comet.
What is "Comet"?
From the symptoms and the fact that you are using a three-year-old compiler, I suspect that your code is running into an optimizer bug. Or, there may be some uninitialized variables causing the problem. You could try using a lower level of optimization for just the subroutine in which the seg-fault occurs, while using a higher optimization level for the other subroutines.
Since you have chosen some degree of optimization, the mapping between line numbers and addresses is imprecise and dependent on the compiler options chosen.
-check bounds inserts much instrumentation, which may limit possible optimizations.
A print statement is effectively a function call, so that too can limit optimization.
So Ian's suggestion makes sense, replacing -O2 by -O1 or -O0, though it's surprising the problem goes away with -O3. Compiling with -O2 -no-vec or with -O2 -fno-inline -no-ip are other reasonable combinations to test. Is reducing optimization on just the subroutine indicated by the traceback sufficient to make everything work?
If you have static (or automatic) arrays that occupy more than 2GB, you might need -mcmodel medium. In this case, any libraries should also be built with -mcmodel medium, or at least -fpic. But if ALLOCATE is used for large arrays, as you indicate, this should not be necessary.