hidden text to trigger early load of fonts ПродукцияПродукцияПродукцияПродукция Các sản phẩmCác sản phẩmCác sản phẩmCác sản phẩm المنتجاتالمنتجاتالمنتجاتالمنتجات מוצריםמוצריםמוצריםמוצרים
Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28986 Discussions

Strange stack error causes program terminated without an error

New Contributor I


I'm an experienced fortran programmer, yet a really strange problem confused me.

As mentioned, the program terminates occasionally after the output

"============================ start performing inversion ..."

It seems a stack smash occurs.

The visual studio 2019 project (with oneapi 2021.3 installed) and needed files are attached. You could run the program with

DCFI3D -a mod1

MKL libraries are needed. Could anyone help find the error? Thanks very much.

0 Kudos
1 Solution
New Contributor I

Okay, now I find where the problem is.

It is because that when I use the MKL subroutine mkl_sparse_d_mm that has a proto of 

stat = mkl_sparse_d_mm (operation, alpha,A, descr, layout, B, columns, ldb,beta, C,ldc)

I declare the variable columns larger than what I need. Thus, it results in a C which takes more memory space than expected. This further causes heap corruption, and no error message could be generated by the Fortran compiler. 

View solution in original post

23 Replies
Honored Contributor III

In order to reproduce the error, you require that we extract and run the EXE that you included in your zip file (GMSH.EXE), which is not only a large file (79 megabytes), but exposes the user to the possibility of viruses in such files. There could be questions regarding whether that file is permitted for open distribution in a forum such as this, as well.

Please run that EXE yourself to generate any data files, and provide those data files that DCFI3D needs in order to run. 

Along those lines, it would be far better if you can condense the program and data to a much smaller size.

0 Kudos
New Contributor I

It is mine under consideration, and the attached files are updated now. Thanks for your attention.

0 Kudos
New Contributor I

I'll try the old IVF compiler in PSXE, to see whether the program works well.


It seems IVF 19.0 also behaves the same way, the problem also exists.

0 Kudos
Honored Contributor III

Have you tried using the /heap-arrays option, which will place local arrays on the heap instead of the stack, as a means of reducing the stack size needed? I ran your program after building with that option, and it stopped with an access violation in subroutine INV_SCRIPT.

0 Kudos
New Contributor I
sure i've tried that option. however, the problem is that the access violation error occurs randomly in subroutine inv_script. sometimes it just stops without an error, so i cannot locate it. do you have any advice aiming this kind of problem?
0 Kudos
Honored Contributor III

Here are a couple of suggestions to consider.

Run inside the Visual Studio debugger. When an access violation occurs, you may see more information about the line number, etc. In one instance, your program stopped with an attempt to access address 0000000000000024.

Try to isolate the problem. Capture the arguments passed to INV_SCRIPT into an unformatted file. Create a test program that just reads that file and calls INV_SCRIPT.

Try using a different compiler, such as Gfortran. However, your program uses features from the latest version of MKL, so this may not be feasible.

0 Kudos
New Contributor I

Under Debug mode, the program crashes at line 319 of FWD_INV.f90, when INV_SCRIPT calling the subroutine system_solver

"if (allocated(dcmod%solve)) deallocate(dcmod%solve)"

which, to my point, is a pretty standard clause.

I'm afraid that it's because the program continues to run with some internal ill-posed RAM and finally crashes.

It is quite hard to tell the real position of the problem, I'll try to comment out some parts.

0 Kudos
Honored Contributor III

>>"if (allocated(dcmod%solve)) deallocate(dcmod%solve)"

>>which, to my point, is a pretty standard clause.

Quite true....

.... provided that dcmod is defined

... provided that dcmod%slove is defined


Note, an allocatable variable/array/udt can have three states: allocated, deallocated, and undefined

And an undefined variable used as argument to allocate/deallocate/allocated/reference will result in undefined behavior.


I suggest in debug mode that you break at that statement and verify that dcmod is defined, then verify if dcmod%solve is defined.

By this I mean that the variables appear to have valid addresses.


Also, if your code is using POINTERs, then your code may be using unin, itialized pointers .OR. dereferencing a pointer that at one time used to be valid, but is no longer valid. IOW addresses would look valid but are pointing at something else including returned space on heap/stack

Jim Dempsey

0 Kudos
Honored Contributor III

The only variables in the user's program with POINTER attribute have "sys_" in their names; dcmod%solve, etc., all have the ALLOCATABLE attribute and, therefore, their status is either allocated or unallocated -- they cannot have their status as undefined. That still leaves the possibilities of array bounds being exceeded, variables with values undefined, etc.

0 Kudos
Honored Contributor III


Unless Intel has fixed a long standing issue with Fortran there was an issue with passing in an unallocated array into an OMP parallel region using PRIVATE as opposed to FIRSTPRIVATE. And in those cases those allocatable variables were undefined. (firstprivate copied in the array descriptor's unallocated state.)

Jim Dempsey

0 Kudos
Honored Contributor III

Jim, I ran OP's code as a single-thread program (i.e., without /Qopenmp), and I did observe the access violation even then. The OMP issues that you just mentioned, if encountered when the same program is compiled with /Qopenmp and run, would be additional complications and the OP's fix (tagged as the answer) may not fix those issues.

0 Kudos
New Contributor I

Thanks very much for your comment. I commented out lots of lines and tried hard to isolate the problems.

Finally, I found that it is an MKL-related problem. I declare more memory than needed, which directly causes heap corruption without any warning. Then, the program could continue running, yet it may crash at any related memory access.

0 Kudos
Honored Contributor III

Have you enabled the compile time diagnostics for interface checking...

and the runtime diagnostics for reads of uninitialized variables and array access out of bounds? (make first run test without optimizations).

Jim Dempsey

0 Kudos
New Contributor I

Yeah, i've tried.

Under Debug mode with all options enabled, the program crashes without any error information at line 319 of FWD_INV.f90, when INV_SCRIPT calling the subroutine system_solver

"if (allocated(dcmod%solve)) deallocate(dcmod%solve)"

which, to my point, is a pretty standard clause.

I'm afraid that it's because the program continues to run with some internal ill-posed RAM and finally crashes.

It is quite hard to tell the real position of the problem, I'll try to comment out some parts.

0 Kudos
Honored Contributor III

Given the difficulty of debugging with a rather large data set, it may be worth the effort to see if the access violation can be exhibited with a much smaller test problem. Do you have such smaller input data files?

It is not clear what you mean by "internal ill-posed RAM". If you mean what is often called "memory corruption", that is certainly a possibility, and you could check by compiling with one of the /check options.

0 Kudos
Valued Contributor III

I would not hurt to try to pull the statement apart such as 


logical yesno
integer error
yesno = allocated(dcmod%solve, stat = error)
deallocate(dcmod%solve, stat = error) 


There are excellent reasons why Fortran compilers provide excellent error messages and it does not hurt to use them. 

Case in point you send the program to someone and they tell you it does not work?  It is a long road to solve the problem if there are no error messages. 

Let the compiler worry about optimizing the code.  



0 Kudos
Valued Contributor III

You have a very large generated mesh, given that humans on data entry make regular mistakes, the only way to see if the mesh is approximately correct is to view it -- autocad, rhino etc...  

How do you view it?  

How do you assure people the code is correct? 

0 Kudos
New Contributor I

Thanks very much for your comment. I commented out lots of lines and tried hard to isolate the problems.

Finally, I found that it is an MKL-related problem. I declare more memory than needed, which directly causes heap corruption without any warning. Then, the program could continue running, yet it may crash at any related memory access. 

0 Kudos
New Contributor I

Okay, now I find where the problem is.

It is because that when I use the MKL subroutine mkl_sparse_d_mm that has a proto of 

stat = mkl_sparse_d_mm (operation, alpha,A, descr, layout, B, columns, ldb,beta, C,ldc)

I declare the variable columns larger than what I need. Thus, it results in a C which takes more memory space than expected. This further causes heap corruption, and no error message could be generated by the Fortran compiler. 

New Contributor I

People should take special care when using MKL functions, especially the input variables such as leading dimension, column size, and so on.

0 Kudos