Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29236 Discussions

A SIGSEGV error in a run program - how to generate a trace back report

Tetsuro_Kikuchi
Beginner
2,347 Views
Hello.

Iintend torun an atmospheric modeling program in my Linux computing system.The configuration of my system to run the modeling program is as follows:

Hardware: HP Z800 Workstation (dual-core)
Linux OS: Ubuntu 11.04
Compiler: Intel Fortran Composer XE 2011 for Linux (Update 4)
The netCDF and Input/Output Application Programming Interface (I/O API) are used for controlling file formats and controlling internal and external communications, respectively.

When I run the core program of the model, a SIGSEGV error occurred and the program stopped on the way of execution.Then I am now trying to solve this problem by followingthe instruction written in the article "Determining Root Cause of SIGSEGV or SIGBUS errors", whichis in Intel Software Network website.

At first, I re-run the program after unlimiting the stacksize for OpenMP (Cause #2). However, it failed again with the similar SIGSEGV error.

For the next step, I would like to isolate where in the code the fault occurred by generating an execution 'traceback'. But I could have not known how to do this. (Actually, I am new to computer programming.) So could anyone teach me how to generate a trace back report?

I also attach the error logfor the run after unlimiting the stacksize. It would be veryhelpful for meifanyone provide the cause and solutionfor the SIGSEGV error by reading the error log.
0 Kudos
8 Replies
mecej4
Honored Contributor III
2,347 Views
Your sources need to be compiled and linked with the -traceback option. If, in addition, you want to see source line numbers, use the -g option also. Please look up these options in the Fortran User Guide.
0 Kudos
jimdempseyatthecove
Honored Contributor III
2,347 Views
In your file open subroutine, after you report "Maximum current record number", insert a diagnostic printout where you printout the LOC of a scalar variable in the subroutine. IOW to see if your code has a recursion problem. If your file opens are multi-threaded then also print out the OpenMP thread number (or thread ID if nested is enabled).

If nothing unusual shows up then you may need to rework your READ statement(s) for the input data such that large temporaries are not used. Note, heap-arrays should reduce stack pressure, please use whenever appropriate.

Jim Dempsey
0 Kudos
Tetsuro_Kikuchi
Beginner
2,347 Views
Thank you, all.

I recompiled theexecutable of theprogram (called "CCTM") with the -g and -traceback options. I added the -gasboth thecompiler and link flags in thebuild script, while the -traceback was added only as the compilerflag. Please see the attached build script ("bldit.cctm") ("User Input Section" - "#> Intel Fortran 10.1 Compiler Flags"). Afterthat, I rerun the program. But it failed again with the same SIGSEGV error, and the trace back report was not also created.

For reference, I attached the run script ("run.cctm").
0 Kudos
Tim_Gallagher
New Contributor II
2,347 Views
Generally if you get a segfault and no traceback is generated, you need to increase your stack size and it should go away. See:

http://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors/

Check out Cause #2.

To find out if this is the issue, type:

ulimit -s unlimited

in your terminal before you run your code.

Tim
0 Kudos
Tetsuro_Kikuchi
Beginner
2,347 Views
Thank you, Tim.

I rerun the program after getting the stack size unlimited as you told. However, it failed again with the similar SIGSEGV error.
0 Kudos
Tim_Gallagher
New Contributor II
2,347 Views
Have you tried the other solutions in the link posted? Particularly heap-arrays and checking for the large temporary arrays.
0 Kudos
mecej4
Honored Contributor III
2,347 Views
The attachments show the added compiler flags, but there is no evidence that the program was rebuilt. In particular, there are no "ifort ..." commands to be seen.

You may need to do a "make clean", if there is a provision for a "clean" target in the makefile, or whatever needs to be done to force a rebuild to take place by deleting .o files (possibly some user-built .a and/or .so files) and running Make again.

If my guess is correct, you changed the options in the make/build files but "make" sees no need to rebuild. One way of handling the impass is to add the makefile itself as a dependency of the main target, but this has some undesirable side effects.
0 Kudos
Tetsuro_Kikuchi
Beginner
2,347 Views
Thank you all for your suggestions.

Later, I found that I had compiledtheInput/Output Application Programming Interface (I/O API) library using a different version of Makeinclude files from that needed for the target program. After recompiling the I/O API library andsuccessively a suite ofsource codes of the program, Icould complete the run of theprogram successfully.

Thank you again for your consideration.
0 Kudos
Reply