- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to diagnose a particularly obnoxious segmentation fault in some code compiled with ifort 15.0.0. When I run the code, it produces a "Segmentation fault (core dumped)" message. In the past, I've been able to diagnose those by adding a "-g -traceback" switch to the compiler. However, this time, despite the rather extensive compiler switch list
-O2 -g -traceback -check bounds -check format -check pointers -check stack -check uninit -fpe0 -ftrapuv -fp-stack-check -gen-interfaces -warn interfaces
the segmentation fault did not produce any traceback. At all. There was just this message, "Segmentation fault (core dumped)". How do I diagnose where this is coming from?
Thanks,
Sourish
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Update: I found the statement which was causing the segfault by using gdb to create a traceback from the core file. I'm still surprised why -traceback didn't produce a traceback, though.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Any possibility we might be able to get our hands on a reproducer? Or I note your version is 15.0.0, any possibility you might be able to check whether a newer 15.0 Update or our latest 16.0 produces a traceback?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Kevin,
I did try 15.0.3, but that had the same problem. Our cluster doesn't have version 16 of the compilers, so I can't tell you if 16 gives a valid traceback. The statement which was generating the segfault was an incorrect attempt at parallelizing an array operation by another coder:
!$OMP PARALLEL !$OMP default (none) & !$OMP shared (alfa1, alfa2, md) md%data = alfa1 * md%data1 + alfa2 * md%data2 !$OMP END PARALLEL
where md was an instance of a derived type; data, data1 and data2 were all double precision arrays of rank 3, and alfa1 and alfa2 were double precision scalars. The segfault only occurred if the rank-3 arrays were large; e.g., 120x90x25 did not produce a segfault, but 360x180x60 did. I tried producing a minimum working example by compiling just this code segment in a separate file, but that did not segfault. I can certainly point you to the actual code, but it's in one of 168 files which comprise an atmospheric transport model, so I'm not sure if it's worth your time to try and compile that codebase for this one error.
I solved it by simply removing the openmp directives.
Cheers,
Sourish
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sourish,
In the code snip of #4, when compiled with the parallel region has all threads performing the same calculations. The difference in behavior (segfault no segfaul) may be due to the compiler optimization rules differing between inside or outside parallel region.
The reason I bring this up is, if your code in general follows that of the above snip, then you have parallelization errors in your code. IOW parallel regions that redundantly perform the same operations. This should not cause a crash, but would be inefficient and not performing what is intended.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HI Jim,
Yes, I realize that that code snippet is subject to parallelization errors, at best causing redundant ops, and at worst slowing things down due to thread locks. As I said, this was a snippet from another coder who I guess had the wrong idea as to how to parallelize large-array operations. I've corrected this snippet as well as some other places where there was inefficient parallelization.
Having said that, I tried parallelizing the above code snippet as follows:
!$OMP PARALLEL !$OMP WORKSHARE md%data = alfa1 * md%data1 + alfa2 * md%data2 !$OMP END WORKSHARE !$OMP END PARALLEL
According to https://software.intel.com/en-us/articles/openmp-workshare-constructs-now-parallelize-with-intel-fortran-compiler-150 this sort of whole-array operation should parallelize with ifort 15.0+ (my compiler is 15.0.3). However, even this 'workshare' construct produced a segfault at this point. Is that expected?
-Sourish
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Your mention of how increasing the array dimensions in post #4 led to a seg-fault had me wondering about the possible cause being exhausting stack space. I don’t know whether it might be shell stack or stack space allocated for each thread or neither.
Had you considered increasing the shell stack limit? (Many bump it with: ulimit -s unlimited)
Or maybe increasing KMP_STACKSIZE (or OMP_STACKSIZE)? (from our Fortran UG, the default is 4Mb on Intel64)

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page