Solved: Re: Extract global variables in a Fortran program

MReza · ‎09-12-2020

I would like to extract the shared variables between two arbitrary subroutines (with nested calls to other subroutines with a long depth) in a Fortran program. One solution that sound intuitive is to find the intersection of the global variables of the two subroutines. I wonder if there is any compiler option that can generate the list of global variables used by a subroutine?

mecej4 · ‎09-13-2020

What does "global" mean, in the context of a Fortran program? Certainly not the global variables that one can have in C or assembler. In Fortran 77, members of shared COMMON blocks could be considered "global" in a restricted sense -- the block name was shared, but variable names could be different in different subprograms that access a common block. In modern Fortran, module variables could be considered global to all subprograms that use that module.

Intel Fortran can generate a cross-reference listing, but you will have to do some post processing of the listing files to correlate variables shared across subprograms and source files.

Take a look at source code analysis tools such as Hyperkwic from Polyhedron (https://polyhedron.com/?product=plusfort#hy ) to see if they are suitable.

View solution in original post

mecej4 · ‎09-13-2020

What does "global" mean, in the context of a Fortran program? Certainly not the global variables that one can have in C or assembler. In Fortran 77, members of shared COMMON blocks could be considered "global" in a restricted sense -- the block name was shared, but variable names could be different in different subprograms that access a common block. In modern Fortran, module variables could be considered global to all subprograms that use that module.

Intel Fortran can generate a cross-reference listing, but you will have to do some post processing of the listing files to correlate variables shared across subprograms and source files.

Take a look at source code analysis tools such as Hyperkwic from Polyhedron (https://polyhedron.com/?product=plusfort#hy ) to see if they are suitable.

MReza · ‎10-13-2020

Many thanks for the useful explanation.

plusFORT from Polyhedron is great but it's commercial. It identifies all the global variables used or changed in subroutines. Can we force ifort shows the same information (used / changed global variables) in the cross-reference listing? As far as I know, the listing only shows (local and global) variables referenced in the subroutines but does not show whether they have been used or modified in the subroutines.

andrew_4619 · ‎10-14-2020

You never said how these variable were shared, is is by common or module? Also what is the final objective, I am not sure you explicitly said that? Maybe with better insight there are other solutions to the problem.

MReza · ‎10-14-2020

My code uses only module variables and there is no common block in the modules. I am working with a climate model (>150,000 line of code) which has one expensive component and my objective is to run the component concurrently with the main model to improve its time-to-solution. As a result, I need to synchronize the component with the main model in each time step. However, the component is coupled to the main model through implicit shared variables. Therefor, first I have to find the coupling fields between the component and the main model and then implement a memory consistency mechanism between them. As a consequence, I need to know if a shared variable is used or changed in the component and the main model?

It would be very helpful if ifort could show all the global variables referenced in every subroutine and also say if they are used or changed in each subroutine. I wonder if there is any flag in the compiler to force it to include such information in the cross-reference listing?

andrew_4619 · ‎10-15-2020

Well if in any subroutine of interest you comment out the "use modulename" line and have "implict None" you will get for sure a list of used module variables in the errors list.

I generally in new code only have "use modulename only: " which then lists explicitly which items are used from the module. You can then see at a glance where every variable in the routine comes from.

MReza · ‎10-15-2020

If I am not wrong, I think your procedure would be challenged by the following problems:

1. Not every subroutines has its own dedicated "use modulename". But, instead, variables from other modules are included once in the beginning of every module and all the subroutines inside that module have access to the variables.

2. Modules are not exclusively dedicated to the component and there might be some subroutines in some modules that do not belong to the component.

3. More importantly, your procedure does not show if a variable is used or changed in the subroutines.

In addition, I would like to automate the process as the cost of manual processing is generally prohibitive due to the following reasons:

1. the component is spread in over 100 modules and doing all the work manually would be a tedious and slow job and prone to errors.

2. In addition, it is not immediately apparent in the code which modules/subroutines belong to the component. I only know about the entry points of the component, which are some subroutines that call the other subroutines of the component. Every entry point offers a specific service (of the component) to the main model. Your procedure requires to know the relevant subroutines (of the component) in advance. It would be very difficult to find the relevant subroutines manually. Ideally, an automatic tool could extract the call graph(s) of the component and identify the target subroutines and then extract the global variables from the subroutines. This procedure has to be repeated for the main model and extract the global variables used by other components. In the end, the intersect of both sets create the set of shared variables.

I am not sure how well your procedure lends itself to automation. The cross-reference listing seems an easier option for post processing (extracting the call graph of the component and shared variables). If it had offer used/changed information over each global variable in every subroutine, it could prepare the ground for a complete and automatic solution.

andrew_4619 · ‎10-15-2020

Before delving too deep it seems clear that ifort does not have a tool that does what you want. If you want an automated task you will have to spend time making code to parse source code and parse listing files. If PlusFort does what you want/need then buy it the cost seems not too much against the many many hours that you will send on the task or maybe the task is too expensive to be worth doing?

With regards to the challenges you note above:

1] Yes if you unconditionally use modules in the declarations section of other modules then you have a considerable amount of namespace pollution to deal with and there is no transparency of what is used where. The manual process would still be the same. Remove the global 'use' and add a 'use only' with each subroutine that throws errors as a result. I don't see an easy way of automating that.

2] I am not sure I get this point, other than you would be changing some subroutines that are not part of the overall task I guess is what you are saying.

3] It certainly shows if a variable is referenced, a variable that is referenced would throw an error one that isn't would not. Changed is a different matter but maybe your tracking/syncing code will need to establish if something has changed anyway?

I think whatever way you tackle this task you will not avoid quite a lot of work, I guess you need to decide if that is worthwhile or not.

Anyway good luck!

MReza · ‎10-15-2020

Of course plusFORT does a big part of the job (parsing the code, creating a call graph, extracting global variables and used/changed information). However, plusFORT still requires lots of post processing (finding the subroutines of the component(s) from the call graph, collecting global variables and then finding the shared variables)! I would like to automate ONLY this post processing task either on the plusFORT's results or ifort cross-reference listing (which is apparently not possible)

It would be wonderful if you could advise an open source Fortran parser, anyway!

1) My climate code uses this bad practice of introducing used modules in the declaration sections of other modules. But if I understand you correctly, I have to first remove all the "use modulenames" from the declaration sections of modules and copy them inside all the subroutines and perform your procedure in each subroutine one by one.

2) I was saying that not all subroutines are of interest and this creates a challenge. I understand that first I have to extract the call graph of my component and find the subroutines of interest in every module. Then I can apply the procedurein (1) to every subroutine of interest. However, I have to copy the "use modulenames" for the subroutine that are not of interest too if they complain.

3) How do you suggest tracking of changed values? In climate models, we are dealing with multi-dimensional long arrays and it will not be viable to keep track of old values and make comparisons due to the memory usage and expensive memory operations! The only solution that seems to me feasible (but not satisfactory!) is to set a flag (or immediately synchronize) whenever a write operation happens.

Many thanks for nice discussion! It is very helpful.

Ibrahim_K_ · ‎10-16-2020

This is a very complex and interesting discussion. I am not sure if I can follow all the ideas put forward, but I would like to put forward another idea.

Since compiler already extracts and tabulates all the symbols why not work with the object files? The following link shows how to dump the symbols table using a utility.

https://stackoverflow.com/questions/11849853/how-to-list-functions-present-in-object-file

Perhaps the mentioned utility could be useful. Other advanced users can comment on the utility of the whole idea.

I. Konuk

mecej4 · ‎10-17-2020

@Ibrahim_K_ : "Since compiler already extracts and tabulates all the symbols why not work with the object files? "

The symbols in the object files are code symbols, i.e., subroutine and function names and, possibly, common block names if the program uses common blocks. What @MReza wants, however, are data symbols, i.e., names of globally accessible data items. Those data symbols may be present if debug information has been put into the OBJ and LIB files, but tools other than DUMPBIN will be needed to access that information (for example, see the DIA SDK ).

@MReza : "I wonder if there is any flag in the compiler to force it to include such information in the cross-reference listing?"

That's a puzzling expectation. If the compiler does not have a provision to generate a list of module variables referenced in a subroutine, you cannot make the compiler "fess-up" information that it does not have.

Compiler-generated cross-reference listings are usually local lists pertaining to one subprogram at a time.

Even if you are able to generate a global cross-reference list using a utility such as PlusFort, the usefulness of such a listing is quite limited. Just as there is no certainty that a particular line of code in a subroutine will be executed, there is no certainty that a particular "global" variable is read or written to. If there are many such global variables, the user can spend a lot of time chasing down their values and changes to those values, even though those values are in fact never used or changed. Conversely, if there are errors such as subscripts out of bounds, aliased arguments, etc., the value of an array element or derived type component may be accessed or changed through the name of an entirely unrelated variable. Thus, the global variable cross-reference list of a large program will contain a large number of variables, but most of those variables will never come into play unless every line of code in the program is executed at least once (which rarely happens with large programs).

jimdempseyatthecove · ‎10-17-2020

Also, when a global variable (subject to examination) is passed as an argument, the compiler (cross-reference, DUMPBIN, etc...) would not have knowledge that the dummy referenced said global.

Mecej4's concern that a test program that modifies globals may not assuredly modify all the potential globals that could be modified. Barring this concern one potential means:

Using the map file, identify the location(s) and size(es) of the .bss and .data segments for the modules.
You can then construct a hash/sum/checksum for chunks of these segments.
Then as the program progresses, you can periodically test for change in hash/sum/checksum. While this won't identify the specific variable(s) modified, when properly configured, it can identify which module data was modified.

An example may be to include in the module as the first data item:

integer(8) :: hash_data
integer(8) :: hash_bss=0

and as the last data items

integer(1) :: end_hash_data
integer(1) :: end_hash_bss=0

Attribute these with for external linkage (e.g. dllexport)

Then following each CALL/call insert

CoNtInUe

A mixed case CONTINUE that is known to not exist in your program.

Then, when you wish to instrument your code, use FPP to replace CoNtInUe with:

CALL CheckChange(__FILE__,__LINE__)

This routine can then generate next hash, compare with current hash, display change info, update hash.

Jim Dempsey

Ibrahim_K_ · ‎09-14-2020

I assume you are referring to named or blank COMMON BLOCKs. I was given here a very useful advice which helped me a lot. Move all COMMON BLOCK declarations to one or more modules and then share them with your subroutines. You will discover very quickly if you have a variable named differently in one of the subroutines which can be one of the problems you are encountering. For the second possible problem of mismatched variable sizes or shapes, the compiler should catch some of them. Size mismatch will cause run time error. You may have to do some work, but it is really worth it.

I. Konuk

Ibrahim_K_ · ‎09-14-2020

f you insist on suing different names, you can use EQUIVALENCE statement. At least you will know that is the case.