Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28629 Discussions

Compiler Does Not Detect Undesired Memory Change Bug

iomega15
Beginner
696 Views
I have a subroutine (lets call it subroutine 1) which sets the values of an array and then the array is passed into a different subroutine(lets call it subroutine 2). Somewhere in subroutine 1, after the values of the array are set, but before it gets passed into subroutine 2, the array's values are changed to zero. There are NO calls to modify the array before it gets passed to subroutine 2, so this should not happen. Also the array is not passed into or out of any other subroutines, prior to being passed to subroutine 2, so there is absolutely nothing that should be changing its value...

The code is written in fortran and uses MPI on lonestar.tacc.utexas.edu. I have tried running with:

Intel Fortran Compiler for Intel EM64T-based applications, Version 9.1 Build 20061101 Package ID:

and with

Intel Fortran Compiler for applications running on Intel 64, Version 10.1 Build 20080602

with the following compiler options "-C -fpe0 -traceback -xT -shared-intel"

and also on

sooner.oscer.ou.edu
with the PGI compiler and the options "-g -gopt -Mbounds -Mchkptr -fastsse -tp=core2-64"

none of these compilers report any errors.

I have also tried setting the values of the array outside of subroutine 1 and then passing it into subroutine 1 with INTENT(IN). In this case, the array becomes protected from modifications and its values are successfully passed into subroutine 2. Although this workaround makes my code work, it makes me feel uneasy about an undetected bug being present in my code.

Please help me debug this! Thanks

~Roman

0 Kudos
7 Replies
Steven_L_Intel1
Employee
696 Views
This is usually caused by a programming error due to array bounds or data type mismatches. You already have -C so that helps with some kinds of array bounds errors, but not if you use * bounds or misdeclare the bounds. Try adding:

-warn interface

and see if anything shows up there. Otherwise, you might try running under the debugger and breaking at various points in the execution to see where the values change. Since you are using MPI, it may be that one of your MPI calls is passing a scalar instead of an array, or the wrong array size.
0 Kudos
iomega15
Beginner
696 Views
Steve,

-warn interface did not offer any new information

I have checked the data type and size of this array and everything seems fine. Is it possible that it is something completely unrelated to this particular array that is causing it, or should I be looking for something that has a relation to this array? Thanks

~Roman
0 Kudos
iomega15
Beginner
696 Views
also, can i protect this array somehow from being changed, after i've set its values (besides putting it outside of subroutine 1, which is not something that i want to do)?
0 Kudos
Steven_L_Intel1
Employee
696 Views
No, you cannot protect the array. Something has an address and is using that to stomp on your array. I am not familiar with debugging on Linux, but perhaps it offers the option to "watch" a memory location and notify you of where that location was changed.
0 Kudos
mriedman
Novice
696 Views
Yes, you can protect entire memory pages using the mprotect() call, see http://linux.die.net/man/2/mprotect. You can set it up to generate a SIGSEGV if the specified pages are written to.
There is no granularity smaller then pages. So if you work with large arrays thiscan be a usable mechanism as large arrays (> 1MB)are usually allocated through mmap().

If you work with scalars then it's difficult unless you insert unused space such that each page contains a single scalar only. And don't forget to enforce mmap()allocation through mallopt(). You want to do all this only if you're _really_ desparate.

Using watchpoints within a debugger might be the easier first choice. However watchpoints can awfully slow down your application. And there are huge differences w.r.t. the quality and functionality of the watchpointimplementation.Try totalview if it's available to you - Good luck !

michael
0 Kudos
iomega15
Beginner
696 Views
Thanks. I have resolved the problem.

The problem was that three unrelated arrays were being returned to subroutine 1 from a completely unrelated subroutine, and their size did not agree with the size that they were declared with in subroutine 1. I found the offending subroutine by putting in write statements throughout subroutine 1 in order to monitor at which point the original array's value got changed.

I also found a subroutine to which the number of arguments passed did not agree with the number of arguments that it was expecting... the compilers did not pick this up either (I expected them to check things that are so basic!).... though this one was not the initial cause of the error, since this subroutine was not executed until I started to debug the code and trying different conditions


Anyways, thanks for your help!
0 Kudos
Steven_L_Intel1
Employee
696 Views
I would have expected -warn interface to pick up the mismatch in number of arguments, but it might not depending on the order of compilatiion. Traditional Fortran with independent routines does not lend itself well to this sort of checking - if you had used modules then it would have been caught.
0 Kudos
Reply