Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Strange memory issue

Feng__Jesse
New Contributor II
1,272 Views

I am using a large crystal plasticity material simulation code written in Fortran and just recently developed a strange issue. I will try to explain the relevant parts of the code here, and if there are any suggestions to find out why this bug is happening, it would be much appreciated!

 

Discovery of the problem:

While running in release mode, the code triggers an access violation error that does not appear in debug mode.

Using print statements, I discovered the integer variable array, named iParentGrainRex, that stored indices used to reference other array elements is corrupted. Instead of storing values of 1 to 200, it has some large values like -1020391029, suggesting memory leak or corruption issues. Again, this does not happen in debug mode, so I cannot find any issue with the algorithms of the code.

The computer is new, so hardware causes are unlikely.

The array iParentGrainRex is statically declared to be of size NGR, which is a constant defined in a constant module. The actual size of iParentGrainRex being used is much smaller. For an initial data size of 200, iParentGrainRex is initialized on line 3 in the following code snippet:

          ...
          do ng = 1,ngrain
            iParentGrainRex(ng) = ng
            iChildGrainRex(1,ng) = ng
            nChildGrainRex(ng) = 1
          enddo
          print *, 'iParentGrainRex 100: ',iParentGrainRex(100)
      endif
      print *, 'iParentGrainRex 100 1: ',iParentGrainRex(100)
      !read state variables (other calls to umat)
      !previous procedure
      if (TIME(2).ne.0.0.or.i_prev_proc.eq.2) then
          call read_statv(s,fileprev,NSTATV,STATEV,ns)
          call get(s)
      endif
      print *, 'iParentGrainRex 100 2: ',iParentGrainRex(100)
      ...

The codes before and after are omitted. As we can see, each ng element in iParentGrainRex is initialized to integer ng. The endif on line 8 makes sure this initialization only occurs once. The value of iParentGrainRex is stored in a state variable array, s, that is passed between the code and another code that controls a larger-scale simulation, the state variable is used to save the material conditions at every step of the simulation. The handling of the state variable is done in subroutines like read_statv, get(s), write_statv, and set(s).

 

Here is where it gets strange:

When I added the print statement on line 16 only, we can see the memory issue:

Feng__Jesse_1-1711981828849.png

The same happens if I only add the print statements on lines 7 and 16. But when I add the print statement on line 9, everything is suddenly normal:

Feng__Jesse_0-1711981749119.png

This does not indicate the code block between lines 9 and 16 that contains read_statv is problematic. Because on time 0, all 3 print statements returned normal values when line 9 was added. But it has issues when line 9 is not there.

Furthermore, this affects other elements. Here I added print statements for element 99 on lines 7 and 16, and we see the issue persists:

Feng__Jesse_2-1711982696919.png

But when I added the print statement for element 100, it also solved the issue for other elements in the array:

Feng__Jesse_3-1711982840154.png

This is so far only found to be fixed with the print statement. When I reassign the value of the element with iParentGrainRex(100) = 100, it fixes it for the element, but not other elements:

Feng__Jesse_5-1711982983576.png

It is worth noting this does not happen to the other two variables that are also declared in the same module and initialized in the same spot as iParentGrainRex:

Feng__Jesse_6-1711983228573.png

 

0 Kudos
1 Solution
Feng__Jesse
New Contributor II
1,188 Views

I am happy to announce that I found and solved the bug. I declared a variable e2(6,6,nph) when nph is not defined at the start of the file. I meant to use a constant NPHM. This explains why the code works if the calculation step is repeated for convergence - on the repeated step, the variable nph becomes defined from the previous step due to the usage of QSave. This also solved the problem in an earlier post where the code in debug mode reports a memory issue at the end of the first step but can proceed normally afterward.

 

I decided before trying the suggestions from Steve and Jim, I would first try to go from a previous version of the code that didn't have memory issues and slowly add in the new codes. I figured the memory issue should be related to the variable declaration or accessing, so it might be a quick find this way, and thank god it was the first thing I ran into.

 

But Steve and Jim's idea of optimizing different files is an excellent idea that I will keep in mind in future works. It will certainly come into play sooner or later.

View solution in original post

5 Replies
Steve_Lionel
Honored Contributor III
1,253 Views

Here's how I approach problems such as this. Assuming the application is made of separate source files, I compile half of them as "Release" and half as "Debug", then see if the problem persists. If not, I switch. Once I establish which set, compiled "Release", still shows the bug (assuming I can)., I than take half of that set and compile Debug. I repeat this process until I have the minimal set, compiled Release, that still shows the problem. (This isn't always possible, but it works more often than not.)

Now that I have that minimal set, I start reducing compile options to get the minimal set that still shows the issue. In most cases, I end up with a single source file that is the culprit. The others I will build with full compile-time and run-time checks, including generated interface checking. What follows is then a slog of reducing the problem source file, commenting out sections to see if I can narrow down which line(s) are involved.

This is not a quick method, and sometimes experience with it helps, but it rarely fails me in the end.

I agree that you have data corruption going on. By the way, which compiler and version are you using? Definitely try the latest versions of ifx and ifort, but don't be surprised if the error disappears due to different storage order (as suggested by what happens with adding print statements.)

0 Kudos
Feng__Jesse
New Contributor II
1,240 Views

I understand the concept of your proposed approach, but I am confused about how to do it.

 

Are you saying there is a way to specify a file to be compiled as debug while the rest is compiled as release? Or am I compiling a module and file on its own without a main program?

0 Kudos
Steve_Lionel
Honored Contributor III
1,198 Views

Perhaps instead of Debug and Release I should have said optimized and unoptimized. It may actually make more sense to reduce the compile options first. I don't know your build environment, but when I use Visual Studio on Windows, I select a set of source files in the project, right click, select Properties, and then change the setting I want to adjust - this applies to all the selected files. Visual Studio will show a slightly different icon for the file to indicate that it has non-default properties.

If you are using a command script, it's a bit more involved.

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,195 Views

To expand on @Steve_Lionel response

jimdempseyatthecove_0-1712068273335.png

You Right-Click on the source file, select Properties, select Release configuration, set Optimization to Disable.

Note, for source files that have other than Project Defaults, an almost unnoticeable red tick mark appears on the Icon of the file.

Later, you can use this to locate the altered files to undo the alternate selection.

 

Jim

0 Kudos
Feng__Jesse
New Contributor II
1,189 Views

I am happy to announce that I found and solved the bug. I declared a variable e2(6,6,nph) when nph is not defined at the start of the file. I meant to use a constant NPHM. This explains why the code works if the calculation step is repeated for convergence - on the repeated step, the variable nph becomes defined from the previous step due to the usage of QSave. This also solved the problem in an earlier post where the code in debug mode reports a memory issue at the end of the first step but can proceed normally afterward.

 

I decided before trying the suggestions from Steve and Jim, I would first try to go from a previous version of the code that didn't have memory issues and slowly add in the new codes. I figured the memory issue should be related to the variable declaration or accessing, so it might be a quick find this way, and thank god it was the first thing I ran into.

 

But Steve and Jim's idea of optimizing different files is an excellent idea that I will keep in mind in future works. It will certainly come into play sooner or later.

Reply