Dear Jim,

Timo_W_ · ‎08-21-2017

Hi all,

We are investigating a bug in our software and we are getting the following message from the intel fortran compiler:

" Boundary Run-Time Check Failure for variable 'var$1825.0.6' "

A similar issue has already been treated in this forum at https://software.intel.com/en-us/forums/intel-fortran-compiler-for-linux-and-mac-os-x/topic/623413 but our problem is somewhat different, so that we could not draw use of the answer there that solved the problem.

We would like to analyse the variable 'var$1825.0.6 but we do not know how to do it. This seems to be an automatically created variable. How are such variables getting created? Is it possible to evaluate them in for instance in gdb?

Thanks for your help in advance,

Timo

jimdempseyatthecove · ‎08-21-2017

I suspect this may be an array temporary. Does adding traceback point to the area of the fault. If so, you may be able to insert some assert code to perform a sanity check on the bounds.

Jim Dempsey

Timo_W_ · ‎08-22-2017

Dear Jim,

yes we have additional diagnosis of the problem.

If we compile our code in "debug-mode", that is with compiler options -g -check all -debug all -ftrapuv -fpe0 -fstack-protector-all, we get a call stack the to the problem and the error message mentioned above ( "Boundary Run-Time Check Failure for variable 'var$1825.0.6")

If we compile it with -O0, we get the same call stack but a different error message: ("forrtl: severe (174): SIGSEGV, segmentation fault occurred").

If we compile it with -O3, we get a different call stack but also a segmentation fault.

The point what makes it difficult for us to catch the source of the error is that we cannot print var$1825.0.6 in gdb. The command p var$1825.0.6 does not work.

Have a nice day,

Timo

jimdempseyatthecove · ‎08-22-2017

Compile in debug mode with runtime checks enabled.

When the error occurs, it may occur in a section of code within some library (without debug symbols). When this happens, you will need to look at the call stack to find the nearest level that is in your code with debug symbols. Then set the debug context to that level and then inspect the arguments to the call. You should find your coding error there.

When you compile without debugging, and in particular without bounds checking, then the bounds (as well as base of array) are not checked and the invalid address us used. When this (invalid) address is in unmapped memory you receive SIGSEGV, *** when the (invalid) address is in mapped memory you corrupt data/code and/or generate trash results.

Jim Dempsey

Timo_W_ · ‎08-23-2017

Dear all,

we fixed our problem. It has been a mpi synchronization issue that killed our stack.

regards

Timo

jimdempseyatthecove · ‎08-23-2017

Would you be so kind to disclose information about your problem, cause and solution? This will help others when they fall into this situation.

Jim Dempsey

Timo_W_ · ‎08-24-2017

Dear Jim,

Beacuse of a race condition, our MPI processes went out of synchronization and this caused an error in one of our mpi reduction routines. We fixed the problem by adding an MPI_Barrier at the critical point which prevents the race condition and ensures synchronicity of the mpi processes.

Regards

Timo