- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm trying to debug a segmentation fault error using information from this forum.
When I compile my ocean model (using ifort (IFORT) 16.0.1 20151021) with the following options
-u -O2 -fltconsistency -shared-intel -mcmodel=medium -heap-arrays
and even if I set ulimit -s unlimited, I get the following error
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libintlc.so.5 00002B508BED39B5 Unknown Unknown Unknown
libintlc.so.5 00002B508BED1777 Unknown Unknown Unknown
libifcore.so.5 00002B508A873872 Unknown Unknown Unknown
libifcore.so.5 00002B508A8736C6 Unknown Unknown Unknown
libifcore.so.5 00002B508A7CC795 Unknown Unknown Unknown
libifcore.so.5 00002B508A7DE5DD Unknown Unknown Unknown
libpthread.so.0 0000003A3220F4A0 Unknown Unknown Unknown
pe_PB_sar25in_tid 000000000043EED8 Unknown Unknown Unknown
pe_PB_sar25in_tid 00000000004047D1 Unknown Unknown Unknown
pe_PB_sar25in_tid 0000000000403070 Unknown Unknown Unknown
pe_PB_sar25in_tid 000000000040212E Unknown Unknown Unknown
libc.so.6 0000003A31A1ECDD Unknown Unknown Unknown
pe_PB_sar25in_tid 0000000000402039 Unknown Unknown Unknown
However, if I try to add debugging and traceback to isolate the error
-u -O2 -fltconsistency -shared-intel -mcmodel=medium -heap-arrays-g -traceback -check all -fp-stack-check
Then the code runs without segmentation faults.
What other things can I try to isolate this segmentation fault.
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try adding just -traceback.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Steve,
That helped, giving me a line number
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libintlc.so.5 00002AD0140BE9B5 Unknown Unknown Unknown
libintlc.so.5 00002AD0140BC777 Unknown Unknown Unknown
libifcore.so.5 00002AD012A5E872 Unknown Unknown Unknown
libifcore.so.5 00002AD012A5E6C6 Unknown Unknown Unknown
libifcore.so.5 00002AD0129B7795 Unknown Unknown Unknown
libifcore.so.5 00002AD0129C95DD Unknown Unknown Unknown
libpthread.so.0 000000326580F4A0 Unknown Unknown Unknown
pe_PB_sar25in_tid 000000000043F158 diag_ 1921 diag.f
pe_PB_sar25in_tid 00000000004047DF step_ 596 step.f
pe_PB_sar25in_tid 0000000000403070 MAIN__ 837 ocean.f
pe_PB_sar25in_tid 000000000040212E Unknown Unknown Unknown
libc.so.6 000000326501ECDD Unknown Unknown Unknown
pe_PB_sar25in_tid 0000000000402039 Unknown Unknown Unknown
Two strange things remain. First the line number is the top line of a do loop
do 110 k=0,km
not where I normally expect a segmentation fault. All these variables exist and are declared. I then added some write statements to make sure the values made sense
do 120 ll=1,mterms
engext(ll)=c0
do 100 i=1,imt
zuseng(i,ll)=c0
zvseng(i,ll)=c0
100 continue
write (6,*) 'll=',ll,' km=',km
call flush (6)
do 110 k=0,km
write (6,*) 'k=',k
call flush (6)
engint(k,ll)=c0
termbm(k,ll,1)=c0
termbm(k,ll,2)=c0
110 continue
120 continue
However, adding these write statements also remove the segmentation fault. I'm not sure what to do next. Any thoughts?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With symptoms like this I often find it's due to memory corruption - writing outside the declared space for a variable. Unfortunately, this might have occurred much earlier in the program. Anything that disturbs the compiler's choice of memory layout can make such errors appear or disappear. Also, adding write statements can disable some optimizations that could change the behavior.
Probably the first thing I would do is build with "-warn interface" to see if you have any errors in routine calls. See if removing options such as -fltconsistency (that's a very old option, superseded by -fp-model) or -heap-arrays changes the behavior. Try dropping the optimization level to 1 or 0 and see what it does.
Next I would run the program under gdb and determine which instruction was getting the segfault. Then it's a bit of a slog to figure out where its input addresses came from - being able to read assembly code helps.
You could also try seeing if compiling all the other sources with -O0 but just this one with -O2 preserves the error. See if you can find the (hopefully small) combination of sources that need -O2 to still show the error. Sometimes this approach can help you identify the real culprit in another source file.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page