- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi!
I compile together two sections of a model (one in c++ and the other in fortran) using a makefile. I have already use the model before and apparently everything is normal during the compilation. however, for some reason it doesn't seem to be able to run the executables created. It starts without any problem but suddenly it gives the error I used for title.
This seems to be a quite usual problem when searching through google, but so far I haven't found a way to fix mine. I increased the stack size to unlimited using
This is what I get when runnning it:
I checked for the libnetcdf_c++.so library and it's in the /usr/lib and the /usr/lib64 folders, so doesn't seem to be a problem with not finding the path to them. It isn't a permission problem either, as I executed the program as superuser and didn't change.
I'm quite a newbie with this, so I still don't know very well where else to look. Any ideas, suggestions, etc are more than welcome
Thanks in advance
I compile together two sections of a model (one in c++ and the other in fortran) using a makefile. I have already use the model before and apparently everything is normal during the compilation. however, for some reason it doesn't seem to be able to run the executables created. It starts without any problem but suddenly it gives the error I used for title.
This seems to be a quite usual problem when searching through google, but so far I haven't found a way to fix mine. I increased the stack size to unlimited using
[bash]ulimit -s unlimited[/bash]I also upgraded my linux distribution to 64 bit arquitecture, as well as the fortran and c++ intel compilers and nothing worked. The program hasn't changed, neither the data, and still works without any problem in other computer. What has changed is the "output frequency", but I tried with the one I used before and I was still having the same problem. I had to reinstall the compilers and the netcdf libraries, but there weren't any problems during the installation.
This is what I get when runnning it:
[bash][ascotilla@ascotilla-HP program_files]$ ./motif-step1b 5Total land points: 61538 Spinup read from: /media/Data/Outputs/data_after_step1a.txt Spinup years: 1000 Spinup output freq: n/a Rampup written too: /media/Data/Outputs/data_after_step1b.txt Rampup years: 1000 Rampup output freq: 20 Run years: 156 Run output freq: 3 out_years: 5 from year: 56 output files in working directory forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source libnetcdf_c++.so. 00002B2412F520A5 Unknown Unknown Unknown libnetcdf_c++.so. 00002B2412F557D5 Unknown Unknown Unknown motif-step1b 000000000044CB9A Unknown Unknown Unknown motif-step1b 000000000044BDDE Unknown Unknown Unknown motif-step1b 0000000000409A72 Unknown Unknown Unknown motif-step1b 000000000040510C Unknown Unknown Unknown libc.so.6 0000003DFAE2169D Unknown Unknown Unknown motif_lpj-step1b 0000000000405009 Unknown Unknown Unknown [ascotilla@ascotilla-HP program_files]$[/bash]
I checked for the libnetcdf_c++.so library and it's in the /usr/lib and the /usr/lib64 folders, so doesn't seem to be a problem with not finding the path to them. It isn't a permission problem either, as I executed the program as superuser and didn't change.
I'm quite a newbie with this, so I still don't know very well where else to look. Any ideas, suggestions, etc are more than welcome
Thanks in advance
Link Copied
14 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I used -traceback in the ifort options to have an idea of where the problem was coming from, and it seems that the program still calls a routine that has been commented out in previous versions of the model. However, I always used the same model, and that didn't change. I even recheck previous versions I have and I always worked with the same files, and never had that problem before. Does it make any sense?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Read this article: Diagnosing Seg Fault/Bus Error/SIGSEGV errors and take the prescribed steps to track down the cause of the seg-fault.
Quite likely, there is an error in the arguments passed from motif_lpj-step1b to the Netcdf C++ library or, less likely, there is an error in the library itself. In either case, localizing the error by having the traceback printout show the routine name and line numbers instead of machine addresses will be helpful.
Your finding that "the program still calls a routine that has been commented out in previous versions of the model" is something that you should investigate thoroughly.
Your long descriptions of actions that you took (such as reinstalling the OS and compilers) serve only to strengthen the suspicion raised above. None of those actions will remove a seg-fault caused by errors in your code or in the Netcdf libraries.
Quite likely, there is an error in the arguments passed from motif_lpj-step1b to the Netcdf C++ library or, less likely, there is an error in the library itself. In either case, localizing the error by having the traceback printout show the routine name and line numbers instead of machine addresses will be helpful.
Your finding that "the program still calls a routine that has been commented out in previous versions of the model" is something that you should investigate thoroughly.
Your long descriptions of actions that you took (such as reinstalling the OS and compilers) serve only to strengthen the suspicion raised above. None of those actions will remove a seg-fault caused by errors in your code or in the Netcdf libraries.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Forget the last post, the subroutine is in the c++ section of the model, but the -traceback option doesn't give any additional information for the c++ bit. My first guess was that there might be a problem in linking the main program (in fortran) with the c++ (input-output) part, but it doesn't seem to have any problems with the other subroutines. I commented out the call to the subroutine and the model runs properly, but the subroutine controls the years output by the model and I'm not getting what I want...:-S
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have just seen this, thank you for the article!
I'll let you know if I make any advances
I'll let you know if I make any advances
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was checking some of the options given by the article and checking my memory usage I got this:
cat /proc/meminfo
It seems to me that I don't have enough free memory...I guess that should explain why it's not working....
cat /proc/meminfo
[bash]MemTotal: 8031844 kB MemFree: 150952 kB Buffers: 2760420 kB Cached: 3642636 kB SwapCached: 792 kB Active: 3266552 kB Inactive: 4080524 kB Active(anon): 802716 kB Inactive(anon): 186428 kB Active(file): 2463836 kB Inactive(file): 3894096 kB Unevictable: 84 kB Mlocked: 64 kB SwapTotal: 10125308 kB SwapFree: 10111464 kB Dirty: 36 kB Writeback: 0 kB AnonPages: 943360 kB Mapped: 119792 kB Shmem: 45104 kB Slab: 234008 kB SReclaimable: 193852 kB SUnreclaim: 40156 kB KernelStack: 2968 kB PageTables: 32820 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 14141228 kB Committed_AS: 2402100 kB VmallocTotal: 34359738367 kB VmallocUsed: 313368 kB VmallocChunk: 34359388340 kB HardwareCorrupted: 0 kB AnonHugePages: 241664 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 2091008 kB DirectMap2M: 6223872 kB[/bash]
It seems to me that I don't have enough free memory...I guess that should explain why it's not working....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok...I know, I should give my thoughts time enough to ripe before posting...
Following mecej's article I aded the option -check bounds -g to the compilers. It can also be done, in my case when executing the makefile:
This gives me another error different from the one I was having until now:
forrtl: severe (408): fort: (3): Subscript #1 of the array DVAL_PREC has value 0 which is less than the lower bound of 1
Following this other site: http://wiki.seas.harvard.edu/geos-chem/index.php/Common_GEOS-Chem_error_messages, I did:
and I got:
I guess the problem comes from the definition of dval_prec=0.0, but I don't see why it's a problem..I didn't create the program (quite obviously..:-P) and I really don't dare to change the code unless I'm completely sure that it will do exactly the same (I mean in terms of performance, I'd love to get rid of the errors for good)...
Any ideas?
Following mecej's article I aded the option -check bounds -g to the compilers. It can also be done, in my case when executing the makefile:
make clean make BOUNDS=yes TRACEBACK=yes
This gives me another error different from the one I was having until now:
forrtl: severe (408): fort: (3): Subscript #1 of the array DVAL_PREC has value 0 which is less than the lower bound of 1
Following this other site: http://wiki.seas.harvard.edu/geos-chem/index.php/Common_GEOS-Chem_error_messages, I did:
grep -i DVAL_PREC *.f*
and I got:
[bash]main.f: subroutine prdaily(mval_prec,dval_prec,mval_wet,year) main.f: real mval_prec(nmonth),dval_prec(ndayyear),mval_wet(nmonth) main.f: if(dval_prec(day-1).lt.0.1) then main.f: dval_prec(day)=0.0 main.f: dval_prec(day)=((-alog(v1))**c2)*mprec(m)*c1 main.f: if(dval_prec(day).lt.0.1) dval_prec(day)=0.0 main.f: mprecip(m)=mprecip(m)+dval_prec(day) main.f: dval_prec(day)=dval_prec(day)*(mval_prec(m)/mprecip(m)) main.f: if (dval_prec(day).lt.0.1) dval_prec(day)=0.0 main.f:c dval_prec(day)=mval_prec(m)/ndaymonth(m) !no generator main.f:c dval_prec(day)=mprec(m) main.f:c dval_prec(day)=0.0 main.f~: subroutine prdaily(mval_prec,dval_prec,mval_wet,year) main.f~: real mval_prec(nmonth),dval_prec(ndayyear),mval_wet(nmonth) main.f~: if(dval_prec(day-1).lt.0.1) then main.f~: dval_prec(day)=0.0 main.f~: dval_prec(day)=((-alog(v1))**c2)*mprec(m)*c1 main.f~: if(dval_prec(day).lt.0.1) dval_prec(day)=0.0 main.f~: mprecip(m)=mprecip(m)+dval_prec(day) main.f~: dval_prec(day)=dval_prec(day)*(mval_prec(m)/mprecip(m)) main.f~: if (dval_prec(day).lt.0.1) dval_prec(day)=0.0 main.f~:c dval_prec(day)=mval_prec(m)/ndaymonth(m) !no generator main.f~:c dval_prec(day)=mprec(m) main.f~:c dval_prec(day)=0.0 [/bash]
I guess the problem comes from the definition of dval_prec=0.0, but I don't see why it's a problem..I didn't create the program (quite obviously..:-P) and I really don't dare to change the code unless I'm completely sure that it will do exactly the same (I mean in terms of performance, I'd love to get rid of the errors for good)...
Any ideas?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You did not display the source line number that the runtime subscript check would have displayed. That information and the source line with that line number would tell you exactly what caused the error.
Setting dval_prec(day) = 0.0 is not the problem. Rather, the line numbered 3. in your post shows
if(dval_prec(day-1).lt.0.1)then
If day has the value 0 or 1, a subscript error occurs here, since the implied lower bound of the array is 1.
Setting dval_prec(day) = 0.0 is not the problem. Rather, the line numbered 3. in your post shows
if(dval_prec(day-1).lt.0.1)then
If day has the value 0 or 1, a subscript error occurs here, since the implied lower bound of the array is 1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, sorry... Actually, the source of error is in the line 1360 of the fortran part. I got this just after the error:
The line 1360 of the main program is a call to a subroutine in C++, but it doesn't give any information about where inside the subroutine the problem is. It was following the instructions from the article and the website that I found out that it seems to be a problem of the array being out of bounds.
As I said, I'm a bit scared with touching the code, as it has already worked for me and for other people and since it's a complex model any variation can alterate the results a lot.
I'm not an expert, but it seems weird to me that the value of "day" will affect the result..if day=0 or 1 it will just accept the second line of the condition, won't it?... In this case I think (although I'm not sure) that it refers to rainy days...and having days with no rain is quite important for the model...
I'll keep on looking for it, but any ideas will be helpful, really...
Cheers
[bash]libnetcdf_c++.so. 00002B07ADEF80A5 Unknown Unknown Unknown libnetcdf_c++.so. 00002B07ADEFB7D5 Unknown Unknown Unknown motif-step1b 000000000044C4D0 Unknown Unknown Unknown motif-step1b 000000000044A526 Unknown Unknown Unknown motif-step1b 000000000040999F MAIN__ 1360 main.f motif-step1b 000000000040512C Unknown Unknown Unknown libc.so.6 0000003DFAE2169D Unknown Unknown Unknown motif-step1b 0000000000405029 Unknown Unknown Unknown[/bash]
The line 1360 of the main program is a call to a subroutine in C++, but it doesn't give any information about where inside the subroutine the problem is. It was following the instructions from the article and the website that I found out that it seems to be a problem of the array being out of bounds.
As I said, I'm a bit scared with touching the code, as it has already worked for me and for other people and since it's a complex model any variation can alterate the results a lot.
I'm not an expert, but it seems weird to me that the value of "day" will affect the result..if day=0 or 1 it will just accept the second line of the condition, won't it?... In this case I think (although I'm not sure) that it refers to rainy days...and having days with no rain is quite important for the model...
I'll keep on looking for it, but any ideas will be helpful, really...
Cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Given the declaration
dval_prec(ndayyear)
and the integer variable day, the reference
dval_prec(day-1)
is illegal for values of (day -1 ) < 1 or > ndayyear, and the behavior of a program that uses array subscripts out of bounds is undefined. This is irrespective of whatever physical or logical significance the offending subscript may have for you.
The code has a bug that needs to be fixed.
and the integer variable day, the reference
dval_prec(day-1)
is illegal for values of (day -1 ) < 1 or > ndayyear, and the behavior of a program that uses array subscripts out of bounds is undefined. This is irrespective of whatever physical or logical significance the offending subscript may have for you.
The code has a bug that needs to be fixed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks again..I'll check the variable "day", to see from which value it starts...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As mecej4 said, there's been a problem in the spinup and there are negative numbers where they shouldn't be (in the input file created by the model in the previous step). I'm having a look more closely to check where the problem is exactly coming from, with a colleague of mine who is more used to the code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>forrtl: severe (408): fort: (3): Subscript #1 of the array DVAL_PREC has value 0 which is less than the lower bound of 1
.AND.
>>main.f:realmval_prec(nmonth),dval_prec(ndayyear),mval_wet(nmonth)
.AND.
>>main.f:if(dval_prec(day-1).lt.0.1)then
The array dval_prec has the array bounds of (1:ndayyear)
Should day represent a 1 based day of year (reasonable assumption) then the dval_prec(day-1) will be wrong when day == 1 (first day of year). Without examining your code the test seems like it is expecting to use the prior day's value of dval_prec. Consider using:
day_prior = day - 1
if(day_prior == 0) day_prior = ndayyear
! *** caution, you may have to account for leap year
if(dval_prec(day_prior).lt.0.1) then
Jim Dempsey
.AND.
>>main.f:realmval_prec(nmonth),dval_prec(ndayyear),mval_wet(nmonth)
.AND.
>>main.f:if(dval_prec(day-1).lt.0.1)then
The array dval_prec has the array bounds of (1:ndayyear)
Should day represent a 1 based day of year (reasonable assumption) then the dval_prec(day-1) will be wrong when day == 1 (first day of year). Without examining your code the test seems like it is expecting to use the prior day's value of dval_prec. Consider using:
day_prior = day - 1
if(day_prior == 0) day_prior = ndayyear
! *** caution, you may have to account for leap year
if(dval_prec(day_prior).lt.0.1) then
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for taking so much time to answer. We've been checking the code with some of the people who wrote it and it was, as you said, a bug. However it wasn't in that part of the code, but in the c++ bit.
I've been not able to reproduce de d_val_prec error again. A few things were changed and now it works, and probably where it was fixed it (a -2 that had to be a -1 in one of the formulas) was where the out of bounds problem was coming from, although not directly.
Thank you mecej4 and jimdempseyatthecove for your help
Cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Although one tends to be thankful when a software bug commits suicide, I think that you may find it worthwhile to pin down and document the changes that were made.
At this point the details of the section of code that gave you problems are fresh in your mind, so documenting the bug should be easier now. If/when the bug comes alive again, you will be glad that you took the time to document it.
At this point the details of the section of code that gave you problems are fresh in your mind, so documenting the bug should be easier now. If/when the bug comes alive again, you will be glad that you took the time to document it.

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page