Strange behavior with IVF compiler

Jon_D · ‎12-20-2007

I am trying to figure out why I get access violation error in the release version of my program (debug version runs to completion with no problem). Strangely, the error seems to be somewhat sporadic. Here are the symptoms:

1. The suspect subroutine has a declaration, something like

TYPE(AppElementType) :: AppElement(11286) !Test

2. I build the executable from scratch, i.e. Build > Rebuild, and the release version of the executable runs to completion.
3. I modify the comment at the declaration line in the suspect subroutine so that I can force a recompilation of the subroutine without changing the active code:

TYPE(AppElementType) :: AppElement(11286) !This is a Test

4. I build the executable using Build > Build. I get a successful compilation but when I run the executable I get the access violation error.
5. Now I perform a rebuild (no changes in the code from step 3), i.e. Build > Rebuild. I get successful compilation and successful run.

Could anybody tell me what might be the cause of this behavior? It seems to me that there is an inherent compilation and linking order among different subroutines/modules/functions of the program that the compiler can identify during Build > Rebuild, but it gets fooled when a single subroutine is modified. Any suggestions will be greatly appreciated.

Thanks,
Jon

Steven_L_Intel1 · ‎12-20-2007

My guess is that you have a coding error that causes an access to incorrect or uninitialized memory. Different ordering of modules in the linker can rearrange memory layout - sometimes the reference will be harmless, sometimes not.

That you get an access violation is good - that will help you identify the memory reference that is at fault and then you can backtrack from there to find the error. Try the debug build again with Generate Interface Blocks and Check Routine Interfaces turned on, if they are not already. If that does not help, then try using Static Verifier as explained in the documentation.

Jon_D · ‎12-20-2007

Steve,

I have already tried both of your suggestions with no luck. I get no errors or warnings with the debug build with the mentioned options turned on. Static Verifier gives me a
#10014: problem during multi-file optimization compilation (code 3) error which is not very helpful.

I also thought that getting access violation error was good but when the debug and release versions behave differently, I don't know how to find the source of the problem. Do you possibly have any other suggestions?

Thanks,
Jon

Steven_L_Intel1 · ‎12-20-2007

Try a debug build with optimization enabled and see if that changes the behavior. Or in the release configuration, turn on debugging in the Fortran and Linker options. Even in release mode, the traceback should identify the line (or region) of code being executed.

Jon_D · ‎12-20-2007

Thanks Steve!

I figured out the problem inserting diagnostic WRITE statements in the code; it was an uninitialized variable. What threw me off was the seemingly random behavior of the compiler. I would insert a WRITE statement in the code and the program would run fine. Then I would just modify a comment line (no change to the active source code that gets compiled) and the program would give me access violation error. Figuring this behavior took me several days. Can the compiler be improved on this front so that compilation of problematic code will look less of a random process?

Jon

Steven_L_Intel1 · ‎12-20-2007

It's not something the compiler can "improve". Any change to the code or data layout can "move" problems referencing uninitialized storage. The best we could do is improve diagnostics for uninitialized variables.

However, the very fact that the symptoms change when you make "unrelated" changes is a red flag for such a problem and should be a big clue.

Jon_D · ‎12-20-2007

Steve,

Now that I know what caused my headaches last few days, I went ahead and did some tests. I intentionally left the variable uninitialized. The variable is used as the upper bound of an array. Sure enough I got the access violation. A print-out of the variable revealed a large number instead of zero (in debug mode this variable is set to zero automatically). Then, I set Fortran > Run-time > Check Array and String Bounds to Yes. Now, the print-out of the variable showed it was set to zero automatically and my program ran just fine. In my opinion, there is no diagnostic benefit to checking array and string bounds if the behavior of the code will automatically change from one setting to another. If the program generated an error during run-time instead of settting the variable to zero automatically, it would save programmers some time in debugging their code.

Thanks,
Jon

Steven_L_Intel1 · ‎12-20-2007

It is not setting anything to zero. Rather, you moved variables around and the uninitialized location you accessed was in a different location and had a different value. Did you try turning on uninitialized variable detection. It's not comprehensive, but might help. Static Verifier, if you can get it to work for you, might do a better job.