Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28445 Discussions

backtrace is empty from usefull info

Jacob_S_
Beginner
747 Views

Hi all,

I'm running a large MPI job of the WRF model application (720 cores), compiled using Intel 2015u1 (15.0.1) and the MVAPICH2 MPI library.

When compiling in debug mode I'm using the following switches :

-g -O0 -fno-inline -no-ip -traceback -fpe0 -check noarg_temp_created,bounds,format,output_conversion,pointers,uninit -ftrapuv -unroll0 -u

I'm running until I have an exception overflow : error (72): floating overflow

However, the traceback of the output file form the specific core is empty with no useful info.

Any ideas on how to proceed in locating the problematic specific line of code ?

 

Thank you,

Jacob

 

 

0 Kudos
3 Replies
Martyn_C_Intel
Employee
747 Views

Hi,

    Do you see a stack trace in the program output with just the program counter (PC) addresses, or no stack trace at all? The latter would suggest that something had been overwritten. You should at least be seeing something like

Image              PC                   Routine        Line        Source

wrf.exe   0000000003AB0000  Unknown    Unknown   Unknown

wrf.exe   0000000003ABCDEF Unknown    Unknown  Unknown

...

I would start by not trying so many debug options at once and focus on the floating-point overflow. Some of those options have side effects that might possibly interfere with each other.  So try

-O0 -fno-inline -no-ip -traceback -fpe0 

-g doesn't hurt, but you don't need it if you just want to get a traceback without doing interactive debugging.

I take it you are not using OpenMP. If you were building at -O2, I might suggest -fno-omit-frame-pointer, but this isn't necessary at -O0. Looking at the output file, how far has WRF progressed? Has it got beyond initialization? Are you able to get to the same point by running with a single MPI rank? (I realize that might be slow unless the error occurs early on). If so, you could probably try interactive debugging. You could also try inserting CALL TRACEBACKQQ('location',-1) at one or two strategic locations in the code to see whether you can get a normal-looking traceback from there. You'll want  USE IFCORE  to get access to the interface.

-traceback works for compiled Fortran user code. The call stack can't be unwound through C functions unless these have also been compiled with -traceback. This doesn't give line numbers and names for the C functions, but it should allow access to Fortran function names and line numbers further up the stack.

If you want to detect uninitialized variables, I recommend -init snan,arrays  instead of -ftrapuv and/or -check uninit. This works for some types of floating-point variables in the 15.0 compiler, but works for a much wider range in 16.0, if you have access to a 16.0 (or 17.0) compiler.

If none of this gets you anywhere, I'd remove the -fpe0 and -ftrapuv options and see whether you learn anything from the -check options, especially the bounds checking.

 

0 Kudos
Steve_Lionel
Honored Contributor III
747 Views

Duplicate of https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/711126

0 Kudos
Jacob_S_
Beginner
747 Views

@Martyn Corden

Thanks for your suggestions (I'm not sure how the question has been duplicated; Maybe the IDZ guys can merge them together).

I'm getting the stack trace in the program output with just the program counter (PC) addresses.

The WRF output is pretty far from initialization (Its deep within the physics part of the program). I know that it aborts following my changes, but my new code is pretty large (~20k line of code) so the stack trace info is necessary. I can try interactive debugging, but for a large job (originally 720 cores) I should choose smaller number of cores (~100 cores) to have enough memory for the domain, but in turn it may be cumbersome to interact with (surely not a single MPI rank).

Its definitely a pure Fortran code. I will try a different compiler with changing the switches.

Thanks,

Jacob.

NOTE: I have succeed with the good-old way of multiple printings in strategic points along my added code to narrow down the problematic block.   

 

 

0 Kudos
Reply