Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

"segmentation fault" issue resolved by adding print statements

psing51
New Contributor I
1,128 Views

Hi,
I was trying to debug a fortran code , executable was compiled with intel v17 on opensuse on intel xeon E5-2690.  The code was experiencing segfault generated within a function (getgb2). Though there  are 11 .f90 files, i am sharing the compilation flags for mod_grib2io.f90.

...

ifort -free -O3 -msse2 -convert big_endian -DLINUX -fp-model precise -assume byterecl  -I../../libs/src/ofs_mods -I../../../hwrf-utilities/libs/mods/g2 -c mod_grib2io.f90

...

ftn -Wl,-noinhibit-exec -o ../../exec/hwrf_gfs2ofs2 flush.o constants.o horiz_interp.o mod_hytime.o mod_flags.o mod_hycomio1.o mod_dump.o mod_grib2io.o mod_geom.o intp.o cd.o -L../../libs -lofs_mods -L../../../hwrf-utilities/libs/ -lg2 -lw3nco_i4r4 -lw3_i4r4 -lbacio -L/usr/lib64 -ljasper -lpng -lz

on running hwrf_gfs2ofs2 i got - 

--- Changing MRF mask
           2           2  exhycom2d size
           2           2  eyhycom2d size
 ismus: mask for the ismus correction is ismus_msk1440x720.dat
 ismus:  MRF mask is corrected for i,j=           2           1

 ---------- output from horiz_interp ----------
  input:  min=     0.000000000  max=     1.000000000  avg=     0.287844061851501
          number of missing points =      0
 output:  min=     0.000000000  max=     1.000000000  avg=     0.477386802434921
          number of missing points =      0
 +++++ # of interations for land/sea mask extrapolation is nextrap=           2
 MRF fluxes: i,min,max=           6  -26.95802       24.18198
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
hwrf_gfs2ofs2      00000000004720C4  Unknown               Unknown  Unknown
libpthread-2.22.s  00002B99B1F77B10  Unknown               Unknown  Unknown
libc-2.22.so       00002B99B21FF9B4  cfree                 Unknown  Unknown
hwrf_gfs2ofs2      00000000004B38A8  Unknown               Unknown  Unknown
hwrf_gfs2ofs2      000000000045732A  Unknown               Unknown  Unknown
hwrf_gfs2ofs2      0000000000456DB7  Unknown               Unknown  Unknown
hwrf_gfs2ofs2      0000000000414755  mod_grib2io_mp_rd         102  mod_grib2io.f90
hwrf_gfs2ofs2      00000000004337A2  MAIN__                    464  intp.f90
hwrf_gfs2ofs2      000000000040391E  Unknown               Unknown  Unknown
libc-2.22.so       00002B99B21A46E5  __libc_start_main     Unknown  Unknown
hwrf_gfs2ofs2      0000000000403829  Unknown               Unknown  Unknown

 

Tried gdb, but i felt i need to dig into the getgb2 function (available in libg2.a -  compiled without -g flag).  Before trying to dig into, i tried checking the value of some of variables being passed within getgb2 function.  To my surprise, the print statements fixed the issue which i was getting. This solution seems very patchy, here is the "new" stdout  (where the original code was crashing ). 

.........
 331    241    240   0.16934   0.10809
 --- Changing MRF mask
           2           2  exhycom2d size
           2           2  eyhycom2d size
 COMMENT|mod_grib2io.f90:before getgb2###############
 COMMENT|lugb:          82 ,lugi:          83 ,jskp:           0 ,jdisc:
           2
 COMMENT|mod_grib2io.f90:after getgb2###############
 COMMENT|lugb:          82 ,lugi:          83 ,jskp:          21 ,jdisc:
           2
 ismus: mask for the ismus correction is ismus_msk1440x720.dat
 ismus:  MRF mask is corrected for i,j=           2           1

 ---------- output from horiz_interp ----------
  input:  min=     0.000000000  max=     1.000000000  avg=     0.287844061851501
          number of missing points =      0
 output:  min=     0.000000000  max=     1.000000000  avg=     0.477386802434921
          number of missing points =      0
 +++++ # of interations for land/sea mask extrapolation is nextrap=           2
 COMMENT|mod_grib2io.f90:before getgb2###############
 COMMENT|lugb:          82 ,lugi:          83 ,jskp:           0 ,jdisc:
           0
 COMMENT|mod_grib2io.f90:after getgb2###############
 COMMENT|lugb:          82 ,lugi:          83 ,jskp:           7 ,jdisc:
           0
 MRF fluxes: i,min,max=           6  -26.95802       24.18198
 COMMENT|mod_grib2io.f90:before getgb2###############
 COMMENT|lugb:          82 ,lugi:          83 ,jskp:           0 ,jdisc:
           0
 COMMENT|mod_grib2io.f90:after getgb2###############
 COMMENT|lugb:          82 ,lugi:          83 ,jskp:           8 ,jdisc:
...

 

 Here is the code section with modifications - 

      print *,'COMMENT|mod_grib2io.f90:before getgb2###############'
      print *,'COMMENT|lugb:',lugb,',lugi:',lugi,',jskp:',jskp,',jdisc:',jdisc
!      print *,',jids:',jids,',jpdtn:',jpdtn,',jpdt:',jpdt,',jgdtn:',jgdtn
!      print *,',jgdt:',jgdt,',jskp:',jskp
!,',gfld:',gfld
!      print *,',iret:',iret
!      print *,'mod_grib2io.f90:############################'

      call getgb2(lugb,lugi,jskp,jdisc,jids,jpdtn,jpdt,jgdtn,jgdt, &
                unpack,jskp,gfld,iret)
      print *,'COMMENT|mod_grib2io.f90:after getgb2###############'
      print *,'COMMENT|lugb:',lugb,',lugi:',lugi,',jskp:',jskp,',jdisc:',jdisc
!      print *,',jids:',jids,',jpdtn:',jpdtn,',jpdt:',jpdt,',jgdtn:',jgdtn
!      print *,',jgdt:',jgdt,',jskp',jskp
!,',gfld',gfld
!      print *,',iret:',iret
!      print *,'mod_grib2io.f90:############################'
 
also, if i enable other commented print statements (ex -  print *,',jids:',jids,' ), the segfault shows up again. I am relatively new to the  fortran stuff, could you please advice on possible causes for this behavior ?. (meanwhile i am trying to dig into the getgb2 function) I am attaching the source code in its presently working form.
 
Eagerly awaiting your replies/suggestions.
0 Kudos
2 Replies
Juergen_R_R
Valued Contributor I
1,128 Views

There is definitely not enough information to say something reasonable on the problem. This looks a quite old code that has been partially adapted to Fortran 90/95 standard, as it is a minimal module wrapper code around something that looks very much like fixed form F77 code. With one exception there are no intents on the dummy arguments of the subroutines. A reasonable advice would be to switch on all bounds checking, checking for uninitialised variables etc. etc. Also maybe switch off all the different optimisation flags like -O3 -sse2 etc. and see of the problem is gone then.Compile your program with debug flags (-g -O0). What I can see is that you are opening files, inquiring on units etc. So the segmentation fault could be related to an illegal operation on one fo those units, particularly given the fact that you are apparently checking the "open" status with a global logical array. 

 

0 Kudos
mecej4
Honored Contributor III
1,128 Views

Sorry, the addition of a PRINT statement is as effective a remedy for segfaults as a placebo is a remedy for headaches.

That the addition of a benign PRINT statement seemed to fix the segfault is an indication that the code has a bug that is likely to hide and could be quite hard to locate and fix. It may happen that the bug will emerge from hiding and give you a bad bite when you simply rerun the program with slightly different data or on a different PC.

0 Kudos
Reply