Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner

backtrace is empty from usefull info

Hi all,

I'm running a large MPI job of the WRF model application (720 cores), compiled using Intel 2015u1 (15.0.1) and the MVAPICH2 MPI library.

When compiling in debug mode I'm using the following switches :

-g -O0 -fno-inline -no-ip -traceback -fpe0 -check noarg_temp_created,bounds,format,output_conversion,pointers,uninit -ftrapuv -unroll0 -u

I'm running until I have an exception overflow : error (72): floating overflow

However, the traceback of the output file form the specific core is empty with no useful info.

Any ideas on how to proceed in locating the problematic specific line of code ?

 

Thank you,

Jacob

0 Kudos
6 Replies
Highlighted
Black Belt

Please present the traceback

Please present the traceback information that you dismissed as being "empty with no useful info". (If it is really empty, we can't know whether the absent information is useful or not, can we?)

0 Kudos
Highlighted
Beginner

@mecej4

@mecej4

You're right -- here is the traceback which is not useful from my perspective :

forrtl: error (72): floating overflow
Image              PC                Routine            Line        Source             
wrf_FAST_43bins_V  0000000011072C41  Unknown               Unknown  Unknown
wrf_FAST_43bins_V  0000000011071397  Unknown               Unknown  Unknown
libnetcdff.so.6    00002AAAAAB9FD12  Unknown               Unknown  Unknown
libnetcdff.so.6    00002AAAAAB9FB66  Unknown               Unknown  Unknown
libnetcdff.so.6    00002AAAAAB865BC  Unknown               Unknown  Unknown
libnetcdff.so.6    00002AAAAAB8AE12  Unknown               Unknown  Unknown
libpthread.so.0    0000003BB080F7E0  Unknown               Unknown  Unknown
wrf_FAST_43bins_V  000000000E9BA787  Unknown               Unknown  Unknown
wrf_FAST_43bins_V  0000000006547964  Unknown               Unknown  Unknown
wrf_FAST_43bins_V  000000000639E156  Unknown               Unknown  Unknown
wrf_FAST_43bins_V  00000000043C54DE  Unknown               Unknown  Unknown
wrf_FAST_43bins_V  0000000003A75634  Unknown               Unknown  Unknown
wrf_FAST_43bins_V  0000000003496833  Unknown               Unknown  Unknown
wrf_FAST_43bins_V  000000000050C3D5  Unknown               Unknown  Unknown
wrf_FAST_43bins_V  0000000000408572  Unknown               Unknown  Unknown
wrf_FAST_43bins_V  0000000000407AC5  Unknown               Unknown  Unknown
wrf_FAST_43bins_V  0000000000407A7E  Unknown               Unknown  Unknown
libc.so.6          0000003BB001ED5D  Unknown               Unknown  Unknown
wrf_FAST_43bins_V  0000000000407989  Unknown               Unknown  Unknown

0 Kudos
Highlighted
Black Belt

From the PC (program counter)

From the PC (program counter) values in the traceback and compiler listings one could find the line number where the fault occurred. That, however, is a bit time-consuming and cumbersome, and is necessary only if the fault occurs in optimized code. 

Did you build the object/library containing wrf_FAST_43bins_V yourself? Did you specify -traceback when compiling that, as well as when compiling libnetcdff.so.6? If not, you could recompile at least the source containing wrf_FAST_43bins_V with -traceback (and relink with -traceback) and run again to obtain a traceback with line numbers displayed.

0 Kudos
Highlighted
Beginner

@mecej4

@mecej4

Thanks for your time taken answering my question.

The WRF model *.exe file, linking the *FAST_43bins* module as well as the NetCDF library, is being generated using a special make file. In that file I used the debug mode as mentioned above.

I have attached the make file "configure.wrf_IDZ" that I used for compilation. You will find a variant of the above compiler switches under "FCDEBUG" (I tried some changes since then, no luck).

Do you see any issues that I should consider changing that might limit the ability of -traceback switch to show the maximum details including line number ?  

Thank you,

Jacob.

0 Kudos
Highlighted
Black Belt

I'm afraid that I cannot help

I'm afraid that I cannot help you directly with WRF, since I have never used it or attempted to build it. In fact, as of now the WRF site is not working, and the sign-up page is off-line.

However, you can find out if a particular object file, e.g, wrf_FAST_43bins_V.o, contains line-number information by using the command:

readelf --debug-dump=decodedline wrf_FAST_43bins_V.o

If the file contains line-number debugging information, you should see output resembling:

Decoded dump of debug contents of section .debug_line:

CU: wrifes.f:
File name                            Line number    Starting address
wrifes.f                                       1                   0

wrifes.f                                       9                0x1c

wrifes.f                                      10                0x26
wrifes.f                                      11                0x84

wrifes.f                                      13                0xa0
wrifes.f                                      14                0xaa
wrifes.f                                      15               0x11b

I used a test file, wrifes.f, just to give you concrete information; this file has nothing to do with WRF. Had I compiled the file without specifying -traceback or -g, the table that I just showed would have been empty.

Note that you now have a table of line-numbers and offsets relative to the routine entry points. You can obtain entry point offsets by running nm -g on the executable that you ran before you obtained the traceback, or you can tell the linker to produce a map when you build your a.out the next time. For each address in the traceback, you can subtract the entry-point address of the routine to obtain the relative address, which you can use in the table produced by readelf to find the corresponding line number in the pertinent source code.

0 Kudos
Highlighted
Beginner

@mecej4

@mecej4

Thanks for suggesting this method.

 

0 Kudos