Program received signal SIGSEGV: Segmentation fault - invalid memory referenc

negar · ‎03-11-2021

Hello,

I am running a Model which I have sent in this link (https://drive.google.com/file/d/1QLXutSW0mCOFMH1hfrhQzpOEgD8vamQy/view). When I run my model (./ibis), I have the following error:

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x7fc13c55c32a
#1 0x7fc13c55b503
#2 0x7fc13bbd8fcf
#3 0x56089be188ff
#4 0x56089be2471d
#5 0x56089be25b53
#6 0x7fc13bbbbb96
#7 0x56089bdd3859
#8 0xffffffffffffffff
Segmentation fault (core dumped)

Could anybody help me to fix this problem?

I would appreciate your help.

Arjen_Markus · ‎03-11-2021

That is a very general question you are posting. Invalid memory references can occur for a wide variety of reasons and it is not easy to find the root cause. However, a few things come to mind:

Use all the compiler flags regarding compile-time and run-time checks to build the program, notably -warn all (might be -warn:all), -check all (might be -check:all), -stand, so that the compiler is allowed to insert all manner of checks into the final program. They may slow it down, but a non-working program has the worst possible performance.
Use a different compiler, gfortran and NAG Fortran come to mind, as compilers differ in the way they check the code and build up the program.
Use valgrind - you seem to be using Linux as Windows would give very different error messages. valgrind will report memory accesses outside the regions you have specified (think of getting the 100th element of an array that has only 10 elements), uninitialised variables, memory leaks and much more.

These techniques are simple to apply. What is much harder is to analyse the results and come up with the corrections your code needs.

negar · ‎03-12-2021

Dear Arjen Markus,

Firstly, I have changed the runtime from 3600 seconds to 86400, and also I have used gfortran in Linux, but I have had the same problem. I installed Valgrind, but I do not know how to use it to solve my problem.

Best regards,

N.T

mecej4 · ‎03-12-2021

Negar, I attempted to download the file from the link that you gave, and found that the file was over a gigabyte in size! At that point, I gave up because if I started debugging a program with source files of that size, I would probably need a few more lifetimes than I have to complete the task.

Arjen gave you some good suggestions. Unfortunately, when you compile a program with checks and debug output enabled, it will run more slowly than otherwise. If the access violation is caused by an array subscript bounds error, uninitialized subscript value, unallocated array, etc., even if you succeed in tracking down the error, fixing it will require intimate knowledge of the program being debugged.

You should think of contacting the authors of IBIS or an IBIS users' forum.

Arjen_Markus · ‎03-12-2021

I do not think a change in runtime will help you with perse, but then I do not know the application.

Basic use of valgrind is rather simple. If your program is invoked as "./ibis" on the command-line, then type instead:

valgrind ./ibis

It will print its report on the console, so you will probably want to redirect it to a file for careful inspection.

negar · ‎03-12-2021

I have the following reports, and I haven't understood what is the problem.

==2168== Memcheck, a memory error detector
==2168== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2168== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==2168== Command: ./ibis
==2168==
==2168== Warning: set address range perms: large range [0x392000, 0x30d9a000) (defined)

****************************************
* IBIS: Integrated BIosphere Simulator *
* Version 2.6b3 *
* March 2002 *
****************************************

length of this simulation (years) : 19
year to begin using anomalies : 9999

model lon, lat resolution (degrees) : 0.02 0.02
model domain (nlon x nlat) : 617 by 534

number of iterations per day : 24
last year run in this sequence : 2019

RD_PARAM: All data read in from parameter files successfully.

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0 0x31e6632a
#1 0x31e65503
#2 0x3281efcf
#3 0x14e8ff
#4 0x15a71d
#5 0x15bb53
#6 0x32801b96
#7 0x109859
#8 0xffffffffffffffff
==2168==
==2168== Process terminating with default action of signal 11 (SIGSEGV)
==2168== at 0x3281EF25: raise (raise.c:46)
==2168== by 0x3281EFCF: ??? (in /lib/x86_64-linux-gnu/libc-2.27.so)
==2168== by 0x14E8FE: readit_ (in /home/nazanin/Desktop/RUN/Run2/ibis)
==2168== by 0x15A71D: MAIN__ (in /home/nazanin/Desktop/RUN/Run2/ibis)
==2168== by 0x15BB53: main (in /home/nazanin/Desktop/RUN/Run2/ibis)
==2168==
==2168== HEAP SUMMARY:
==2168== in use at exit: 343,609 bytes in 3,378 blocks
==2168== total heap usage: 4,643 allocs, 1,265 frees, 1,101,979 bytes allocated
==2168==
==2168== LEAK SUMMARY:
==2168== definitely lost: 0 bytes in 0 blocks
==2168== indirectly lost: 0 bytes in 0 blocks
==2168== possibly lost: 0 bytes in 0 blocks
==2168== still reachable: 343,609 bytes in 3,378 blocks
==2168== suppressed: 0 bytes in 0 blocks
==2168== Rerun with --leak-check=full to see details of leaked memory
==2168==
==2168== For counts of detected and suppressed errors, rerun with: -v
==2168== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)

Arjen_Markus · ‎03-12-2021

Well, the segmentation fault occurs in the routine readit_ - see the stacktrace. Unfortunately you do not get an indication of the location within that routine. You ought to get it when you build the program with debugging enabled (compile option -g)

negar · ‎03-12-2021

Since I know the subroutine readit is for input data for the model. How can I use stacktrace and option -g when I compile my model?

Arjen_Markus · ‎03-12-2021

If you use the debug option the compiler will insert a lot of extra information in the program. That information makes it possible for valgrind to report where things go awry - that is: which statement in the source code.

negar · ‎03-12-2021

Thank you for your help

mecej4 · ‎03-18-2021

[NOTE: Negar contacted me by Private Message for help with this. She provided a link to a compressed archive, 690 MB long, containing the source and data files. Readers, please note that this thread is about troubleshooting input data and array size mismatches with a large Fortran program (IBIS) and that she is using Gfortran, not Intel Fortran.]

Negar, here is some advice for your consideration, before I show you what went wrong.

If you are involved in modifying a large program (about 900 kB) with complex input data, you will need to learn enough Fortran+OS+IBIS+NetCDF+Debugging skills to do so. There is no way around that; some of us may show you the way, but you will have to do the "lifting and walking". History has it that Ptolemy personally sponsored the great mathematician Euclid. He found Euclid's seminal work, the Elements, too difficult to study, so he asked if there were an easier way to master it. According to Proclus Euclid famously quipped: "Sire, there is no Royal Road to geometry."
When asking for help with a problem, try to reduce the size of the source and data to a manageable size. I find that the attached 1.2 MB Zip file contains all that is needed to reproduce and analyze the problem.
It is rarely enough to report the last few error messages such as "access violation". Doing so conveys the impression that you say "There is some problem and I don't know what it is. Please fix it for me." Such a request can be considered rude or inconsiderate.

I built the program from your sources on Windows 10/Cygwin 64 using Gfortran 10.2 with the command

gfortran -Iinc -fallow-argument-mismatch -ffixed-line-length-0 -g -C src/*.f -lnetcdff -lnetcdf -o gibis

Note the two options that I used for debugging: -g, which puts debug information into the EXE/a.out, and -C, which adds code to check subscripts.

When I ran the resulting program, ibis.exe, the output was:

S:\Negar>gibis

 ****************************************
 * IBIS: Integrated BIosphere Simulator *
 * Version 2.6b3                        *
 * March 2002                           *
 ****************************************


 length of this simulation (years)   :       19
 year to begin using anomalies       :     9999

 model lon, lat resolution (degrees) :     0.02    0.02
 model domain (nlon x nlat)          : 617 by 534

 number of iterations per day        :       24
 last year run in this sequence      :     2019

 RD_PARAM: All data read in from parameter files successfully.
 ERROR in subroutine readit
 number of land points in input/surta.nc
 does not match number of land points in compar.h
 in surta =       18946  in compar.h =           3
STOP 1

Here is how the access violation occurred: Line 4063 of IO.F contains

            garea(nlpoints) = yres * 111400.0 * xres * 111400.0 * cos(xlat)

The array GAREA is declared GAREA(3), yet the data is such that NLPOINTS can reach values in the thousands.

Here is what you have to do: in file COMPAR.H, there is a parameter, NPOI, currently given the value 3. You have to replace that '3' with a value that is large enough. How much? That is something that you will have to work out yourself, using the IBIS documentation + the source and include files, or with the help of an experienced IBIS user. I cannot tell you what to use because I neither use IBIS, nor do I know how to read the voluminous input data files.

Note that the IBIS source code handles this matter of array sizing in an outmoded way. The modern way is to read enough of the input data to ascertain/calculate the required array sizes, and then ALLOCATE those arrays to the exact size required -- like buying shoes that fit, rather than the biggest size that any human would ever require.

The part of the IBIS code that did the checking is fine, but the code may well have crashed with an access violation or other error earlier, because the comparison of the array upper bound to the number of points is performed after all the points have been read in. An array violation occurs after only 3 points have been read in, but whether this gets detected depends quite a bit on the compiler and compiler options used.

JohnNichols · ‎03-19-2021

Dear @negar

I am glad that @mecej4 showed you how to solve the problem. You certainly hit the correct forum.

@mecej4 suggested the following:

The modern way is to read enough of the input data to ascertain/calculate the required array sizes, and then ALLOCATE those arrays to the exact size required -- like buying shoes that fit, rather than the biggest size that any human would ever require.

An alternative way that I use, because I get lots of huge input files from all over the world is to create a simple txt file and store it with the program. In it you can place the size of arrays you need each time and then allocate on starting the program. It helps if you have a lot of samples to run and you want to be as quick as possible.

Just a thought - a simple sample:

1003
Gouda Quays 
1002
2
7

Translate

negar · ‎04-12-2021

Hello again,

I have run my model, and I understood what the problem was. When I was running my model for a duration, for every single year, some variables are NaN in annual diagnostic fields like below. Annual diagnostic fields show the sum of each variable in my case study every year. Also, firstly I have checked for example ate yearly output, there is no problem with this output data, even it has values for almost every cell. Secondly, when I run my model for fewer points (decrease the area of my case study), annual diagnostic fields don't show NaN for any variables. I think it can be for a core dump, but I am not sure about it. Could anybody help me to handle this problem and guide me? I would appreciate it.

I have sent my model in the following link. It should be mentioned that the run time for running this model is 35-40 minutes.

https://drive.google.com/file/d/19z8cxBMbRZbsd6lwpAzncgfDGyQlUSXG/view?usp=sharing

* * * annual diagnostic fields * * *

total nee of the domain (gt-c/yr) : NaN

total npp of the domain (gt-c/yr) : NaN

total gpp of the domain (gt-c/yr) : NaN

total biomass of the domain (gt-c) : 1.069

aboveground litter of the domain (gt-c) : 0.155

belowground litter of the domain (gt-c) : 0.040

total soil carbon of the domain (gt-c) : 1.724

total soil co2 flux of the domain (gt-c) : NaN

aboveground litter n of the domain (gt-c) : 0.001

belowground litter n of the domain (gt-c) : 0.000

total soil nitrogen of the domain (gt-c) : 0.167

average precipitation of the domain (mm/yr) : 804.867

average aet of the domain (mm/yr) : NaN

average transpiration of the domain (mm/yr) : 482.458

average runoff of the domain (mm/yr) : 69.600

average surf runoff of the domain (mm/yr) : 33.980

average drainage of the domain (mm/yr) : 35.622

average moisture recharge of the domain (mm/yr) : NaN

total aet / precipitation : NaN

total runoff / precipitation : 0.086

transpiration / total aet : NaN

surface runoff / total runoff : 0.488

jimdempseyatthecove · ‎04-13-2021

Build in Debug mode with all compile time .AND. runtime checks enabled.

Correct all compile time errors and warnings.

Then run to see if runtime errors occur. Note, not all runtime errors can be detected.

If this exposes fixable errors, then fix them, compile as stated above and run again.

When (if) you still have unresolved NaN's then post back here.

Because the NaN errors appear as you grow your data, this may be a case of accessing array(s) out of bounds. The runtime checks should catch most of these (but not all instances of these).

To help locate the point of error, start at the point of the NaN printout and then work backwards in your code inserting test for NaN on the offending variable(s). At some point you will discover where the NaN is introduced. Then look around there for cause.

Jim Dempsey

negar · ‎04-14-2021

What do you mean when you said: "Build in Debug mode with all compile time"?

I do not have any errors or warnings when I run my model. Also, in my output data, like aet, there is just one huge number,11711477, whereas others cells have values in order of 1. Could it be because of this large number that it shows NaN in the average of the domain?

jimdempseyatthecove · ‎04-14-2021

Sorry, compile time diagnostics and warnings

/warn:all
/check:all

Also, it is very helpful to assure IMPLICIT NONE is used in subroutines and functions such that typographical errors (and mis-typings) are detected.

Jim Dempsey

jimdempseyatthecove · ‎04-14-2021

A large number where a small one is expected is an indication that something is wrong.

Follow the same advice as for the NaN...

Starting at the point where you know a value is incorrect, go backwards in your code inserting test to locate the point in the code where the number deviates from the expected result. Then determine the cause of the deviation. Hint: I find it easy to add a helper subroutine such as:

subroutine DebugThis()
print *,"bug" ! Break here
end subroutine DebugThis

Then in your code

IF(YourNumberThatBlowsUp > 10) call DebugThis()

Then you insert that further back into your code to locate the point of deviation.

Of course you would have to modify the test condition as you may need to test the constituents.

Note, when you are at the break point, use the Call Stack window to set the focus at the higher stack levels (such that you can examine the variables in the scope of the caller). You may need to go up more than one call level to locate the next source(s) of generation of the error.

Also, you could use the Fortran Preprocessor and then pass in as arguments __LINE__, __FILE__

And print that out (and then STOP)

Note, in the debugger, you can (at STOP) use the Debugger "Set Next Statement" to position at the END SUBROUTINE statement such that you can return if need be for testing.

Jim Dempsey

Select Your Language

Using Intel.com Search

Quick Links

Recent Searches

Advanced Search

Only search in

Program received signal SIGSEGV: Segmentation fault - invalid memory referenc