Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29253 Diskussionen

A SIGSEGV error in a run program - how to generate a trace back report

Tetsuro_Kikuchi
Einsteiger
2.376Aufrufe
Hello.

Iintend torun an atmospheric modeling program in my Linux computing system.The configuration of my system to run the modeling program is as follows:

Hardware: HP Z800 Workstation (dual-core)
Linux OS: Ubuntu 11.04
Compiler: Intel Fortran Composer XE 2011 for Linux (Update 4)
The netCDF and Input/Output Application Programming Interface (I/O API) are used for controlling file formats and controlling internal and external communications, respectively.

When I run the core program of the model, a SIGSEGV error occurred and the program stopped on the way of execution.Then I am now trying to solve this problem by followingthe instruction written in the article "Determining Root Cause of SIGSEGV or SIGBUS errors", whichis in Intel Software Network website.

At first, I re-run the program after unlimiting the stacksize for OpenMP (Cause #2). However, it failed again with the similar SIGSEGV error.

For the next step, I would like to isolate where in the code the fault occurred by generating an execution 'traceback'. But I could have not known how to do this. (Actually, I am new to computer programming.) So could anyone teach me how to generate a trace back report?

I also attach the error logfor the run after unlimiting the stacksize. It would be veryhelpful for meifanyone provide the cause and solutionfor the SIGSEGV error by reading the error log.
0 Kudos
8 Antworten
mecej4
Geehrter Beitragender III
2.376Aufrufe
Your sources need to be compiled and linked with the -traceback option. If, in addition, you want to see source line numbers, use the -g option also. Please look up these options in the Fortran User Guide.
jimdempseyatthecove
Geehrter Beitragender III
2.376Aufrufe
In your file open subroutine, after you report "Maximum current record number", insert a diagnostic printout where you printout the LOC of a scalar variable in the subroutine. IOW to see if your code has a recursion problem. If your file opens are multi-threaded then also print out the OpenMP thread number (or thread ID if nested is enabled).

If nothing unusual shows up then you may need to rework your READ statement(s) for the input data such that large temporaries are not used. Note, heap-arrays should reduce stack pressure, please use whenever appropriate.

Jim Dempsey
Tetsuro_Kikuchi
Einsteiger
2.376Aufrufe
Thank you, all.

I recompiled theexecutable of theprogram (called "CCTM") with the -g and -traceback options. I added the -gasboth thecompiler and link flags in thebuild script, while the -traceback was added only as the compilerflag. Please see the attached build script ("bldit.cctm") ("User Input Section" - "#> Intel Fortran 10.1 Compiler Flags"). Afterthat, I rerun the program. But it failed again with the same SIGSEGV error, and the trace back report was not also created.

For reference, I attached the run script ("run.cctm").
Tim_Gallagher
Neuer Beitragender II
2.376Aufrufe
Generally if you get a segfault and no traceback is generated, you need to increase your stack size and it should go away. See:

http://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors/

Check out Cause #2.

To find out if this is the issue, type:

ulimit -s unlimited

in your terminal before you run your code.

Tim
Tetsuro_Kikuchi
Einsteiger
2.376Aufrufe
Thank you, Tim.

I rerun the program after getting the stack size unlimited as you told. However, it failed again with the similar SIGSEGV error.
Tim_Gallagher
Neuer Beitragender II
2.376Aufrufe
Have you tried the other solutions in the link posted? Particularly heap-arrays and checking for the large temporary arrays.
mecej4
Geehrter Beitragender III
2.376Aufrufe
The attachments show the added compiler flags, but there is no evidence that the program was rebuilt. In particular, there are no "ifort ..." commands to be seen.

You may need to do a "make clean", if there is a provision for a "clean" target in the makefile, or whatever needs to be done to force a rebuild to take place by deleting .o files (possibly some user-built .a and/or .so files) and running Make again.

If my guess is correct, you changed the options in the make/build files but "make" sees no need to rebuild. One way of handling the impass is to add the makefile itself as a dependency of the main target, but this has some undesirable side effects.
Tetsuro_Kikuchi
Einsteiger
2.376Aufrufe
Thank you all for your suggestions.

Later, I found that I had compiledtheInput/Output Application Programming Interface (I/O API) library using a different version of Makeinclude files from that needed for the target program. After recompiling the I/O API library andsuccessively a suite ofsource codes of the program, Icould complete the run of theprogram successfully.

Thank you again for your consideration.
Antworten