Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Segmentation fault without traceback

Sourish_B_
Beginner
875 Views

Hello,

I am trying to diagnose a particularly obnoxious segmentation fault in some code compiled with ifort 15.0.0. When I run the code, it produces a "Segmentation fault (core dumped)" message. In the past, I've been able to diagnose those by adding a "-g -traceback" switch to the compiler. However, this time, despite the rather extensive compiler switch list

-O2 -g -traceback -check bounds -check format -check pointers -check stack -check uninit -fpe0 -ftrapuv -fp-stack-check -gen-interfaces -warn interfaces

the segmentation fault did not produce any traceback. At all. There was just this message, "Segmentation fault (core dumped)". How do I diagnose where this is coming from?

Thanks,

Sourish

0 Kudos
6 Replies
Sourish_B_
Beginner
875 Views

Update: I found the statement which was causing the segfault by using gdb to create a traceback from the core file. I'm still surprised why -traceback didn't produce a traceback, though.

0 Kudos
Kevin_D_Intel
Employee
875 Views

Any possibility we might be able to get our hands on a reproducer?  Or I note your version is 15.0.0, any possibility you might be able to check whether a newer 15.0 Update or our latest 16.0 produces a traceback?

0 Kudos
Sourish_B_
Beginner
875 Views

Hi Kevin,

I did try 15.0.3, but that had the same problem. Our cluster doesn't have version 16 of the compilers, so I can't tell you if 16 gives a valid traceback. The statement which was generating the segfault was an incorrect attempt at parallelizing an array operation by another coder:

!$OMP PARALLEL 
!$OMP  default (none) &
!$OMP  shared  (alfa1, alfa2, md)
md%data = alfa1 * md%data1 + alfa2 * md%data2
!$OMP END PARALLEL

where md was an instance of a derived type; data, data1 and data2 were all double precision arrays of rank 3, and alfa1 and alfa2 were double precision scalars. The segfault only occurred if the rank-3 arrays were large; e.g., 120x90x25 did not produce a segfault, but 360x180x60 did. I tried producing a minimum working example by compiling just this code segment in a separate file, but that did not segfault. I can certainly point you to the actual code, but it's in one of 168 files which comprise an atmospheric transport model, so I'm not sure if it's worth your time to try and compile that codebase for this one error.

I solved it by simply removing the openmp directives.

Cheers,

Sourish

0 Kudos
jimdempseyatthecove
Honored Contributor III
875 Views

Sourish,

In the code snip of #4, when compiled with the parallel region has all threads performing the same calculations. The difference in behavior (segfault no segfaul) may be due to the compiler optimization rules differing between inside or outside parallel region.

The reason I bring this up is, if your code in general follows that of the above snip, then you have parallelization errors in your code. IOW parallel regions that redundantly perform the same operations. This should not cause a crash, but would be inefficient and not performing what is intended.

Jim Dempsey

0 Kudos
Sourish_B_
Beginner
875 Views

HI Jim,

Yes, I realize that that code snippet is subject to parallelization errors, at best causing redundant ops, and at worst slowing things down due to thread locks. As I said, this was a snippet from another coder who I guess had the wrong idea as to how to parallelize large-array operations. I've corrected this snippet as well as some other places where there was inefficient parallelization.

Having said that, I tried parallelizing the above code snippet as follows:

!$OMP PARALLEL 
!$OMP  WORKSHARE
md%data = alfa1 * md%data1 + alfa2 * md%data2
!$OMP END WORKSHARE
!$OMP END PARALLEL

According to https://software.intel.com/en-us/articles/openmp-workshare-constructs-now-parallelize-with-intel-fortran-compiler-150 this sort of whole-array operation should parallelize with ifort 15.0+ (my compiler is 15.0.3). However, even this 'workshare' construct produced a segfault at this point. Is that expected?

-Sourish

0 Kudos
Kevin_D_Intel
Employee
875 Views

Your mention of how increasing the array dimensions in post #4 led to a seg-fault had me wondering about the possible cause being exhausting stack space. I don’t know whether it might be shell stack or stack space allocated for each thread or neither.

Had you considered increasing the shell stack limit?   (Many bump it with: ulimit -s unlimited)
Or maybe increasing KMP_STACKSIZE (or OMP_STACKSIZE)?    (from our Fortran UG, the default is 4Mb on Intel64)

0 Kudos
Reply