- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am running a Model which I have sent in this link (https://drive.google.com/file/d/1QLXutSW0mCOFMH1hfrhQzpOEgD8vamQy/view). When I run my model (./ibis), I have the following error:
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7fc13c55c32a
#1 0x7fc13c55b503
#2 0x7fc13bbd8fcf
#3 0x56089be188ff
#4 0x56089be2471d
#5 0x56089be25b53
#6 0x7fc13bbbbb96
#7 0x56089bdd3859
#8 0xffffffffffffffff
Segmentation fault (core dumped)
Could anybody help me to fix this problem?
I would appreciate your help.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That is a very general question you are posting. Invalid memory references can occur for a wide variety of reasons and it is not easy to find the root cause. However, a few things come to mind:
- Use all the compiler flags regarding compile-time and run-time checks to build the program, notably -warn all (might be -warn:all), -check all (might be -check:all), -stand, so that the compiler is allowed to insert all manner of checks into the final program. They may slow it down, but a non-working program has the worst possible performance.
- Use a different compiler, gfortran and NAG Fortran come to mind, as compilers differ in the way they check the code and build up the program.
- Use valgrind - you seem to be using Linux as Windows would give very different error messages. valgrind will report memory accesses outside the regions you have specified (think of getting the 100th element of an array that has only 10 elements), uninitialised variables, memory leaks and much more.
These techniques are simple to apply. What is much harder is to analyse the results and come up with the corrections your code needs.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Arjen Markus,
Firstly, I have changed the runtime from 3600 seconds to 86400, and also I have used gfortran in Linux, but I have had the same problem. I installed Valgrind, but I do not know how to use it to solve my problem.
Best regards,
N.T
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Negar, I attempted to download the file from the link that you gave, and found that the file was over a gigabyte in size! At that point, I gave up because if I started debugging a program with source files of that size, I would probably need a few more lifetimes than I have to complete the task.
Arjen gave you some good suggestions. Unfortunately, when you compile a program with checks and debug output enabled, it will run more slowly than otherwise. If the access violation is caused by an array subscript bounds error, uninitialized subscript value, unallocated array, etc., even if you succeed in tracking down the error, fixing it will require intimate knowledge of the program being debugged.
You should think of contacting the authors of IBIS or an IBIS users' forum.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I do not think a change in runtime will help you with perse, but then I do not know the application.
Basic use of valgrind is rather simple. If your program is invoked as "./ibis" on the command-line, then type instead:
valgrind ./ibis
It will print its report on the console, so you will probably want to redirect it to a file for careful inspection.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have the following reports, and I haven't understood what is the problem.
==2168== Memcheck, a memory error detector
==2168== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2168== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==2168== Command: ./ibis
==2168==
==2168== Warning: set address range perms: large range [0x392000, 0x30d9a000) (defined)
*****************************
* IBIS: Integrated BIosphere Simulator *
* Version 2.6b3 *
* March 2002 *
*****************************
length of this simulation (years) : 19
year to begin using anomalies : 9999
model lon, lat resolution (degrees) : 0.02 0.02
model domain (nlon x nlat) : 617 by 534
number of iterations per day : 24
last year run in this sequence : 2019
RD_PARAM: All data read in from parameter files successfully.
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x31e6632a
#1 0x31e65503
#2 0x3281efcf
#3 0x14e8ff
#4 0x15a71d
#5 0x15bb53
#6 0x32801b96
#7 0x109859
#8 0xffffffffffffffff
==2168==
==2168== Process terminating with default action of signal 11 (SIGSEGV)
==2168== at 0x3281EF25: raise (raise.c:46)
==2168== by 0x3281EFCF: ??? (in /lib/x86_64-linux-gnu/libc-2.
==2168== by 0x14E8FE: readit_ (in /home/nazanin/Desktop/RUN/
==2168== by 0x15A71D: MAIN__ (in /home/nazanin/Desktop/RUN/
==2168== by 0x15BB53: main (in /home/nazanin/Desktop/RUN/
==2168==
==2168== HEAP SUMMARY:
==2168== in use at exit: 343,609 bytes in 3,378 blocks
==2168== total heap usage: 4,643 allocs, 1,265 frees, 1,101,979 bytes allocated
==2168==
==2168== LEAK SUMMARY:
==2168== definitely lost: 0 bytes in 0 blocks
==2168== indirectly lost: 0 bytes in 0 blocks
==2168== possibly lost: 0 bytes in 0 blocks
==2168== still reachable: 343,609 bytes in 3,378 blocks
==2168== suppressed: 0 bytes in 0 blocks
==2168== Rerun with --leak-check=full to see details of leaked memory
==2168==
==2168== For counts of detected and suppressed errors, rerun with: -v
==2168== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, the segmentation fault occurs in the routine readit_ - see the stacktrace. Unfortunately you do not get an indication of the location within that routine. You ought to get it when you build the program with debugging enabled (compile option -g)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since I know the subroutine readit is for input data for the model. How can I use stacktrace and option -g when I compile my model?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you use the debug option the compiler will insert a lot of extra information in the program. That information makes it possible for valgrind to report where things go awry - that is: which statement in the source code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[NOTE: Negar contacted me by Private Message for help with this. She provided a link to a compressed archive, 690 MB long, containing the source and data files. Readers, please note that this thread is about troubleshooting input data and array size mismatches with a large Fortran program (IBIS) and that she is using Gfortran, not Intel Fortran.]
Negar, here is some advice for your consideration, before I show you what went wrong.
- If you are involved in modifying a large program (about 900 kB) with complex input data, you will need to learn enough Fortran+OS+IBIS+NetCDF+Debugging skills to do so. There is no way around that; some of us may show you the way, but you will have to do the "lifting and walking". History has it that Ptolemy personally sponsored the great mathematician Euclid. He found Euclid's seminal work, the Elements, too difficult to study, so he asked if there were an easier way to master it. According to Proclus Euclid famously quipped: "Sire, there is no Royal Road to geometry."
- When asking for help with a problem, try to reduce the size of the source and data to a manageable size. I find that the attached 1.2 MB Zip file contains all that is needed to reproduce and analyze the problem.
- It is rarely enough to report the last few error messages such as "access violation". Doing so conveys the impression that you say "There is some problem and I don't know what it is. Please fix it for me." Such a request can be considered rude or inconsiderate.
I built the program from your sources on Windows 10/Cygwin 64 using Gfortran 10.2 with the command
gfortran -Iinc -fallow-argument-mismatch -ffixed-line-length-0 -g -C src/*.f -lnetcdff -lnetcdf -o gibis
Note the two options that I used for debugging: -g, which puts debug information into the EXE/a.out, and -C, which adds code to check subscripts.
When I ran the resulting program, ibis.exe, the output was:
S:\Negar>gibis
****************************************
* IBIS: Integrated BIosphere Simulator *
* Version 2.6b3 *
* March 2002 *
****************************************
length of this simulation (years) : 19
year to begin using anomalies : 9999
model lon, lat resolution (degrees) : 0.02 0.02
model domain (nlon x nlat) : 617 by 534
number of iterations per day : 24
last year run in this sequence : 2019
RD_PARAM: All data read in from parameter files successfully.
ERROR in subroutine readit
number of land points in input/surta.nc
does not match number of land points in compar.h
in surta = 18946 in compar.h = 3
STOP 1
Here is how the access violation occurred: Line 4063 of IO.F contains
garea(nlpoints) = yres * 111400.0 * xres * 111400.0 * cos(xlat)
The array GAREA is declared GAREA(3), yet the data is such that NLPOINTS can reach values in the thousands.
Here is what you have to do: in file COMPAR.H, there is a parameter, NPOI, currently given the value 3. You have to replace that '3' with a value that is large enough. How much? That is something that you will have to work out yourself, using the IBIS documentation + the source and include files, or with the help of an experienced IBIS user. I cannot tell you what to use because I neither use IBIS, nor do I know how to read the voluminous input data files.
Note that the IBIS source code handles this matter of array sizing in an outmoded way. The modern way is to read enough of the input data to ascertain/calculate the required array sizes, and then ALLOCATE those arrays to the exact size required -- like buying shoes that fit, rather than the biggest size that any human would ever require.
The part of the IBIS code that did the checking is fine, but the code may well have crashed with an access violation or other error earlier, because the comparison of the array upper bound to the number of points is performed after all the points have been read in. An array violation occurs after only 3 points have been read in, but whether this gets detected depends quite a bit on the compiler and compiler options used.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear @negar
I am glad that @mecej4 showed you how to solve the problem. You certainly hit the correct forum.
@mecej4 suggested the following:
The modern way is to read enough of the input data to ascertain/calculate the required array sizes, and then ALLOCATE those arrays to the exact size required -- like buying shoes that fit, rather than the biggest size that any human would ever require.
An alternative way that I use, because I get lots of huge input files from all over the world is to create a simple txt file and store it with the program. In it you can place the size of arrays you need each time and then allocate on starting the program. It helps if you have a lot of samples to run and you want to be as quick as possible.
Just a thought - a simple sample:
1003
Gouda Quays
1002
2
7
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello again,
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Build in Debug mode with all compile time .AND. runtime checks enabled.
Correct all compile time errors and warnings.
Then run to see if runtime errors occur. Note, not all runtime errors can be detected.
If this exposes fixable errors, then fix them, compile as stated above and run again.
When (if) you still have unresolved NaN's then post back here.
Because the NaN errors appear as you grow your data, this may be a case of accessing array(s) out of bounds. The runtime checks should catch most of these (but not all instances of these).
To help locate the point of error, start at the point of the NaN printout and then work backwards in your code inserting test for NaN on the offending variable(s). At some point you will discover where the NaN is introduced. Then look around there for cause.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What do you mean when you said: "Build in Debug mode with all compile time"?
I do not have any errors or warnings when I run my model. Also, in my output data, like aet, there is just one huge number,11711477, whereas others cells have values in order of 1. Could it be because of this large number that it shows NaN in the average of the domain?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, compile time diagnostics and warnings
/warn:all
/check:all
Also, it is very helpful to assure IMPLICIT NONE is used in subroutines and functions such that typographical errors (and mis-typings) are detected.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A large number where a small one is expected is an indication that something is wrong.
Follow the same advice as for the NaN...
Starting at the point where you know a value is incorrect, go backwards in your code inserting test to locate the point in the code where the number deviates from the expected result. Then determine the cause of the deviation. Hint: I find it easy to add a helper subroutine such as:
subroutine DebugThis()
print *,"bug" ! Break here
end subroutine DebugThis
Then in your code
IF(YourNumberThatBlowsUp > 10) call DebugThis()
Then you insert that further back into your code to locate the point of deviation.
Of course you would have to modify the test condition as you may need to test the constituents.
Note, when you are at the break point, use the Call Stack window to set the focus at the higher stack levels (such that you can examine the variables in the scope of the caller). You may need to go up more than one call level to locate the next source(s) of generation of the error.
Also, you could use the Fortran Preprocessor and then pass in as arguments __LINE__, __FILE__
And print that out (and then STOP)
Note, in the debugger, you can (at STOP) use the Debugger "Set Next Statement" to position at the END SUBROUTINE statement such that you can return if need be for testing.
Jim Dempsey
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page