Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Fortran executable reassigns variable to large number, reason unknown

Donahue__Sean
Beginner
1,195 Views

I have a collection of old source code that I am attempting to update, and have come across a strange problem. Within the module I'm working on, there is a subroutine that apparently reassigns a variable to a seemingly arbitrary large number for not apparent reason. "IW" is an index variable that should just iterate from 1 to the requested number of runs. However, somewhere within the trajup subroutine, it gets reassigned from 1 to 1072926779. This doesn't make a ton of sense to me,  as I have searched through the trajup source code, as well as the alinxy source code that trajup references, and IW does not even seem to appear in the code. I put various print flags into the trajup source code to try and narrow down the source, but the location where the value changes seems to move around. Given all this, and the sudden and immense jump in value, my instinct is that the error is a result of some memory mismanagement leading to some value overflow, but I have to admit, I'm well out of my depth here.

I have attached a zip of all the relevant source code, as well as the produced executable. The files that the executable are meant to operate on are already in the "release" folder, so you can run the file as is (although the executable fails, as it is trying to assign a value to the IWth entry in a matrix.

For reference, I am running windows 10, intel parallel studio xe 2018, and visual  studio professional 2017

Any insight as to why my variable is blowing up is highly appreciated.

0 Kudos
19 Replies
IanH
Honored Contributor II
1,196 Views

Compiling with all runtime debugging enabled (within a Debug configuration, set the property Fortran > Run-time > Runtime Error Checking to "All (/check:all)"), I see...

 

forrtl: severe (194): Run-Time Check Failure. The variable 'RUNMONT$Y' is being used in 'xxxx\Variable overflow\Mudemimp\Mudemimp.f90(782,7)' without being defined
Image              PC                Routine            Line        Source
ALINXY.exe         00007FF798690D39  Unknown               Unknown  Unknown
ALINXY.exe         00007FF79867B740  RUNMONT                   782  Mudemimp.f90

 

0 Kudos
mecej4
Honored Contributor III
1,196 Views

There are lots of bugs in your program, and it will be up to you to find and fix them. Here is one, in addition to what IanH pointed out:

The common block TRAJIN  in subroutine INTRP8 has the first member AREF, which is not used elsewhere and causes a mismatch with other declarations of the same common block.

When your code goes beyond array bounds and uses undefined variables, it is not correct to expect logical behavior from the program. In general, in the presence of such errors, the behavior of the program is "undefined".

0 Kudos
LRaim
New Contributor I
1,196 Views

mecej4 (Blackbelt) wrote:

There are lots of bugs in your program, and it will be up to you to find and fix them. Here is one, in addition to what IanH pointed out:

The common block TRAJIN  in subroutine INTRP8 has the first member AREF, which is not used elsewhere and causes a mismatch with other declarations of the same common block.

When your code goes beyond array bounds and uses undefined variables, it is not correct to expect logical behavior from the program. In general, in the presence of such errors, the behavior of the program is "undefined".

About common /TRAJIN/ .... common variables may be named differently in the various subroutines, though this can be considered a bad programming habit.  But this is not an error.

0 Kudos
mecej4
Honored Contributor III
1,196 Views

In this case, it is a programmer error, since the various instances of the block are as follows:

S:\LANG\VOF\Mudemimp>grep -in "common.*trajin" *.[fF]90
ALINXY.F90:81:  COMMON /TRAJIN/  AREF,  GAMMAE,  VE,  ZE
CONVERT.F90:14: COMMON /TRAJIN/ GAMMAE, VE, ZE
TRAJUP.F90:47:  COMMON /TRAJIN/  GAMMAE,  VE,  ZE
TRAJUP.F90:501: COMMON /TRAJIN/ GAMMAE, VE, ZE

If any of the common block members are referenced or changed in the subprogram containing line 81 of ALINXY.F90, a run time error is highly probable.

0 Kudos
JohnNichols
Valued Contributor III
1,195 Views

But this is not an error.

There are many things in this world I would and probably have said -- but saying this to mecej4 is like arguing with God or your wife.  Interesting but useless. 

Thanks for the laugh.  

PS:  It is freaking lousy programming and using commons is bad practice --

0 Kudos
Donahue__Sean
Beginner
1,196 Views

IanH,

That's interesting. I'm not showing that error, when I run my debugger, although it certainly seems like that variable is not defined in the code. I will try to track down if that variable was intended to be defined elsewhere. Thanks

mecej4,

I am aware the code has multiple bugs. I was not asking for someone else to fix the code in its entirety, and I apologize if that is how it came across. I have encountered one specific bug, that I cannot explain and was hoping someone could help me understand it better.

However, I appreciate you pointing out the disconnect between the common block contents. The aref variable is defined in different common blocks between different subroutines. I have updated the code to be consistent, but I do not note any change in the results. 

As further clarity, I have (sort of) circumvented the problem. IW is defined in common /ranele/ along with several other large arrays. Two of these rout(5000) and zout(5000) are called within trajup. I sequestered the common blocks into two blocks /ranele1/ (which contains everything from ranele except IW, and /ranele2/ which just has IW in it, and that fixed the problem. Is this maybe an available memory thing? Did putting too many bits of data into the /ranele/ block maybe overload it, and cause the IW value to go crazy? I haven't seen any data on max size of common blocks, but maybe there's a best practice there?

0 Kudos
JohnNichols
Valued Contributor III
1,195 Views

I haven't seen any data on max size of common blocks, but maybe there's a best practice there?

Take a long weekend - stock up on the coke and rewrite all of the code in modules. 

I have a lot of old programs - getting rid of the commons will show you all the mistakes you do not know you have. 

Good luck. 

0 Kudos
mecej4
Honored Contributor III
1,196 Views

No offence taken at all, just pointing out that when there are multiple bugs and the program performs complex calculations with multiple subroutines and does I/O from/to several files, one has to understand a lot more about the program before attempting to fix it than is reasonable for someone looking for the first time at the program source.

You are making several guesses regarding what the program is doing, and I am not inclined to accept the explanations that you have given. What do you mean by "available memory thing"? Most PCs these days come with 4 or 8 GB of RAM, so a common block that occupies 40 or 80 K is tiny. Unless you still have a PC running MSDOS using the Small Model, having arrays exceeding 64 K is a routine occurrence.

If you have access to the authors or experienced users of the program, a chat/email exchange with them would be a good thing to do. Who wrote the program? What were the assumptions made, what are the limits of the program variables? What were the circumstances when the program was built and run successfully? Is it possible to replicate that run?

Names of variables in COMMON blocks have no significance across subprograms. Unless you have symbolic debugging enabled, the names do not even exist in the OBJ, DLL or EXE files that you generate. Rather, it is the byte offset of a member variable from the starting location of the common block that has to match, if the program is to work correctly. 

0 Kudos
Donahue__Sean
Beginner
1,196 Views

John,

I think you may be right. It's a little frustrating, as we have a version of the application that actually works, but the source code is out of date, and I was hoping the changes were minimal enough I could just do spot fixes, but I keep finding more problems, and it's looking more and more like a basically full re-write is in order. Oh well.

I appreciate the advice about omitting commons. They were in the original code, so I was trying to work with them, but if it's considered better practice to omit, I will take that into consideration.

0 Kudos
Steve_Lionel
Honored Contributor III
1,196 Views

Named COMMON must be the same length in all program units. mecej4 points out in post #5 that one of the declarations is a different length with an additional variable - this is VERY likely to cause problems. Blank COMMON is allowed to have different lengths in different program units.

A program might appear to work and then fail when compiled with a newer/different compiler. That isn't an indication that the program is correct, only that bugs were masked.

0 Kudos
Donahue__Sean
Beginner
1,196 Views

Mecej4 and Steve,

I appreciate the insight regarding the common statement. That is not how I thought it worked; the clarification may help a lot in following how the code is meant to be working and getting it updated.

Unfortunately this version of the code was farmed out to an outside contractor nearly twenty years ago, and I doubt getting in touch with the specific author is likely. I have reached out to the company and am waiting for a reply, but am not holding my breath.

0 Kudos
mecej4
Honored Contributor III
1,196 Views

Here is a toy program to illustrate how common block variables misbehave when named common blocks do not have consistent sizes.

program tst
implicit none
integer a,b,c
common /xx/a,b,c
a=1; b=2; c=3
print '(A,3i3,A)','A,B,C = ',a,b,c,' in TST'
call sub()
end program

subroutine sub
implicit none
integer b,c
common /xx/b,c
print '(A,3x,2i3,A)','B,C   = ',b,c,' in SUB'
return
end subroutine

It would be quite useful if you have the original code and original data for a case that you think "worked" or was handed to you as correctly working program source and data, and are able to post that code here.

0 Kudos
LRaim
New Contributor I
1,196 Views

A named or blank COMMON may have different lengths in many subroutines across many source files.

The linker sees lengths values in the .obj files and creates the COMMON area with the maximum length.   

0 Kudos
mecej4
Honored Contributor III
1,196 Views

Luigi R. wrote:
A named or blank COMMON may have different lengths in many subroutines across many source files.

Quote from section 5.7.2.5 of the Fortran 2008 Standard:

Named common blocks of the same name shall be of the same size in all scoping units of a program in which
they appear, but blank common blocks may be of different sizes.

The problem with the OP's code, however, is not just about the labelled common block length. It appears from the variable names used that the second variable in the block in one place is supposed to match the first variable in the same block in another place. That will not happen.

0 Kudos
Donahue__Sean
Beginner
1,197 Views

Unfortunately, I do not believe we have any functioning source code. I have this source code (from 2001)  and we do have a functioning executable that was apparently "fixed" and compiled at a later date, but no one knows who fixed it or where that code might be. I can at least check answers against that, but in terms of tracing the calculations step by step, I think I'm out of luck.

0 Kudos
mecej4
Honored Contributor III
1,197 Views

According to https://apps.dtic.mil/dtic/tr/fulltext/u2/a514944.pdf , "MUDEMIMP is a Southwest Research Institute (SwRI) modification of a program developed by the Naval Civil Engineering Laboratory (NCEL)." Comments in the main program file state:

!  WRITTEN BY LOUIS HUANG FOR NCEL IN 1984

!  Modified to include incremental bin locations and to calculate

!  roll based on horizontal velocity component only by M. Polcyn

!  6/94 at SwRI

I believe that SWRI is SouthWest Research Institute, https://www.swri.org . Probably worth a try.

0 Kudos
Donahue__Sean
Beginner
1,197 Views

mecej4,

I appreciate the suggestion, but SWRI actually made a previous version of mudemimp, that was eventually found to have a bad algorithm for predicting bounce. The code was later updated for better performance by another group, which is theoretically the version I have, but the code I have is obviously not doing what it is supposed to. I need the code for the new version; I have asked the contractor about it, and they are theoretically looking into it, but it was a long time ago, and they are unsure if they will be able to turn anything up.

0 Kudos
JohnNichols
Valued Contributor III
1,196 Views

No offence taken at all, just pointing out that when there are multiple bugs and the program performs complex calculations with multiple subroutines and does I/O from/to several files, one has to understand a lot more about the program before attempting to fix it than is reasonable for someone looking for the first time at the program source.

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

There is often a lot written about a program, but you have to hunt it out as noted above.  Maintaining old code is a long slow process fraught with a lot of missteps. 

Also most of the old guys on this site probably have 40+ years of Fortran experience starting in their late teens, so they have deep knowledge. 

Unless you have lived and breathed it -- it is not an simple language.  

There are ways. 

John

0 Kudos
LRaim
New Contributor I
1,197 Views

In the .exe file the COMMON is created by the linker which, fortunately,  does know anything of fortran standard. 
For example, one can have in the same .exe a subroutine, written in assembler, which uses the same COMMON with more variables. 
This subroutine can take the variables prepared by Fortran and computes the additional variables.   

0 Kudos
Reply