- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm an experienced fortran programmer, yet a really strange problem confused me.
As mentioned, the program terminates occasionally after the output
"============================ start performing inversion ..."
It seems a stack smash occurs.
The visual studio 2019 project (with oneapi 2021.3 installed) and needed files are attached. You could run the program with
DCFI3D -a mod1
MKL libraries are needed. Could anyone help find the error? Thanks very much.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay, now I find where the problem is.
It is because that when I use the MKL subroutine mkl_sparse_d_mm that has a proto of
stat = mkl_sparse_d_mm (operation, alpha,A, descr, layout, B, columns, ldb,beta, C,ldc)
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In order to reproduce the error, you require that we extract and run the EXE that you included in your zip file (GMSH.EXE), which is not only a large file (79 megabytes), but exposes the user to the possibility of viruses in such files. There could be questions regarding whether that file is permitted for open distribution in a forum such as this, as well.
Please run that EXE yourself to generate any data files, and provide those data files that DCFI3D needs in order to run.
Along those lines, it would be far better if you can condense the program and data to a much smaller size.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It is mine under consideration, and the attached files are updated now. Thanks for your attention.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'll try the old IVF compiler in PSXE, to see whether the program works well.
It seems IVF 19.0 also behaves the same way, the problem also exists.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you tried using the /heap-arrays option, which will place local arrays on the heap instead of the stack, as a means of reducing the stack size needed? I ran your program after building with that option, and it stopped with an access violation in subroutine INV_SCRIPT.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here are a couple of suggestions to consider.
Run inside the Visual Studio debugger. When an access violation occurs, you may see more information about the line number, etc. In one instance, your program stopped with an attempt to access address 0000000000000024.
Try to isolate the problem. Capture the arguments passed to INV_SCRIPT into an unformatted file. Create a test program that just reads that file and calls INV_SCRIPT.
Try using a different compiler, such as Gfortran. However, your program uses features from the latest version of MKL, so this may not be feasible.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Under Debug mode, the program crashes at line 319 of FWD_INV.f90, when INV_SCRIPT calling the subroutine system_solver
"if (allocated(dcmod%solve)) deallocate(dcmod%solve)"
which, to my point, is a pretty standard clause.
I'm afraid that it's because the program continues to run with some internal ill-posed RAM and finally crashes.
It is quite hard to tell the real position of the problem, I'll try to comment out some parts.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
>>"if (allocated(dcmod%solve)) deallocate(dcmod%solve)"
>>which, to my point, is a pretty standard clause.
Quite true....
.... provided that dcmod is defined
... provided that dcmod%slove is defined
Note, an allocatable variable/array/udt can have three states: allocated, deallocated, and undefined
And an undefined variable used as argument to allocate/deallocate/allocated/reference will result in undefined behavior.
I suggest in debug mode that you break at that statement and verify that dcmod is defined, then verify if dcmod%solve is defined.
By this I mean that the variables appear to have valid addresses.
Also, if your code is using POINTERs, then your code may be using unin, itialized pointers .OR. dereferencing a pointer that at one time used to be valid, but is no longer valid. IOW addresses would look valid but are pointing at something else including returned space on heap/stack
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The only variables in the user's program with POINTER attribute have "sys_" in their names; dcmod%solve, etc., all have the ALLOCATABLE attribute and, therefore, their status is either allocated or unallocated -- they cannot have their status as undefined. That still leaves the possibilities of array bounds being exceeded, variables with values undefined, etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mecej4,
Unless Intel has fixed a long standing issue with Fortran there was an issue with passing in an unallocated array into an OMP parallel region using PRIVATE as opposed to FIRSTPRIVATE. And in those cases those allocatable variables were undefined. (firstprivate copied in the array descriptor's unallocated state.)
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Jim, I ran OP's code as a single-thread program (i.e., without /Qopenmp), and I did observe the access violation even then. The OMP issues that you just mentioned, if encountered when the same program is compiled with /Qopenmp and run, would be additional complications and the OP's fix (tagged as the answer) may not fix those issues.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks very much for your comment. I commented out lots of lines and tried hard to isolate the problems.
Finally, I found that it is an MKL-related problem. I declare more memory than needed, which directly causes heap corruption without any warning. Then, the program could continue running, yet it may crash at any related memory access.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you enabled the compile time diagnostics for interface checking...
and the runtime diagnostics for reads of uninitialized variables and array access out of bounds? (make first run test without optimizations).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yeah, i've tried.
Under Debug mode with all options enabled, the program crashes without any error information at line 319 of FWD_INV.f90, when INV_SCRIPT calling the subroutine system_solver
"if (allocated(dcmod%solve)) deallocate(dcmod%solve)"
which, to my point, is a pretty standard clause.
I'm afraid that it's because the program continues to run with some internal ill-posed RAM and finally crashes.
It is quite hard to tell the real position of the problem, I'll try to comment out some parts.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Given the difficulty of debugging with a rather large data set, it may be worth the effort to see if the access violation can be exhibited with a much smaller test problem. Do you have such smaller input data files?
It is not clear what you mean by "internal ill-posed RAM". If you mean what is often called "memory corruption", that is certainly a possibility, and you could check by compiling with one of the /check options.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would not hurt to try to pull the statement apart such as
logical yesno
integer error
yesno = allocated(dcmod%solve, stat = error)
write(*,*)yesno
if(yesno)then
deallocate(dcmod%solve, stat = error)
endif
write(*,*)error
There are excellent reasons why Fortran compilers provide excellent error messages and it does not hurt to use them.
Case in point you send the program to someone and they tell you it does not work? It is a long road to solve the problem if there are no error messages.
Let the compiler worry about optimizing the code.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You have a very large generated mesh, given that humans on data entry make regular mistakes, the only way to see if the mesh is approximately correct is to view it -- autocad, rhino etc...
How do you view it?
How do you assure people the code is correct?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks very much for your comment. I commented out lots of lines and tried hard to isolate the problems.
Finally, I found that it is an MKL-related problem. I declare more memory than needed, which directly causes heap corruption without any warning. Then, the program could continue running, yet it may crash at any related memory access.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Okay, now I find where the problem is.
It is because that when I use the MKL subroutine mkl_sparse_d_mm that has a proto of
stat = mkl_sparse_d_mm (operation, alpha,A, descr, layout, B, columns, ldb,beta, C,ldc)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
People should take special care when using MKL functions, especially the input variables such as leading dimension, column size, and so on.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page