I am running a large Fortran code compiled with Intel Fortran 184.108.40.206 and linked to Intel MPI 2019 update 7. I include the -traceback option in the compilation, along with -static-intel -m64 -O2 -ipo -mt_mpi. The job I run uses only one OpenMP thread and 12 MPI processes. After several days of running, the execution freezes because one MPI process hits a seg fault and the other processes hang. I am trying to figure out where the seg fault is occurring. The traceback is:
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
fds_impi_intel_li 00000000085B11CA Unknown Unknown Unknown
libpthread-2.17.s 00002BA8B8773630 Unknown Unknown Unknown
fds_impi_intel_li 00000000089D4647 Unknown Unknown Unknown
fds_impi_intel_li 00000000089C70DE Unknown Unknown Unknown
fds_impi_intel_li 0000000008689E88 Unknown Unknown Unknown
fds_impi_intel_li 000000000040A122 Unknown Unknown Unknown
libc-2.17.so 00002BA8BA5F2555 __libc_start_main Unknown Unknown
fds_impi_intel_li 000000000040A029 Unknown Unknown Unknown
srun: error: burn001: task 4: Exited with exit code 174
What do I need to do to get a line number?
for any Fortran code traceback you need -g along with -traceback
and yes you can use -g -O2 together with -traceback
you need to link with ifort or mpiifort as well with -g -traceback.
That should get you the fortran portion of the stacks USUALLY. There are cases where the stack gets corrupted and in those cases you won't get line numbers on the traceback.
Thanks for the info. But why isn't the -g option mentioned in the description of -traceback in the User's Guide:
I'll try it on my case in any event.
Well, traceback option is to show a traceback information in case of crash. It was printed in your case.
The additional debug information (symbols and line numbers) is generated by the compiler with -g.
Note that you need to explicitly set the optimization level with -g (-g -O2), otherwise the optimization is disabled with -g.