When compiling and running a Fortran program on Linux (OpenSUSE Leap 42.3) I get an undefinable error message stating, that some "Boundary Run-Time Check Failure" ocurred for variable "ARGBLOCK_0.0.2". But this variable I don't know or use in my code and the compiler is tracing me back to the line of a "CONTAINS" statement in a module.
I am using the Intel Fortran Compiler from Intel Composer XE 2013 with the following Options:
ifort -fPIC -g -traceback -O2 -check all,noarg_temp_created -warn all
Furthermore, the program uses Intel MKL with the functions
DGETRF, DGETRS, DSYGV, DGEMM, DGGEV
The complete error message looks like:
Boundary Run-Time Check Failure for variable 'ARGBLOCK_0.0.2' forrtl: error (76): Abort trap signal Image PC Routine Line Source libc.so.6 00007F2BF06CC8D7 Unknown Unknown Unknown libc.so.6 00007F2BF06CDCAA Unknown Unknown Unknown geops 00000000006A863F Unknown Unknown Unknown libmodell.so 00007F2BF119E54D strukturtest_mod_ 223 strukturtest_mod.f90 libmodell.so 00007F2BF1184056 modell_start_ 169 modell_start.f90 geops 000000000045D1A3 Unknown Unknown Unknown geops 000000000042C2C6 Unknown Unknown Unknown geops 000000000040A14C Unknown Unknown Unknown libc.so.6 00007F2BF06B86E5 Unknown Unknown Unknown geops 000000000040A049 Unknown Unknown Unknown =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = EXIT CODE: 134 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Aborted (signal 6) This typically refers to a problem with your application. Please see the FAQ page for debugging suggestions
The program has the following structure:
- basic functions linked into static library (*.a), containing only modules --> using MKL routines
- main program linked into a dynamic library, containing 1 bare subroutine, modules else
- calling program (executed with mpiexec), calls mentioned subroutine in main program
Without the calling program (in Open MPI) the subroutine runs without problems. But when invoking it with the MPI program I get the error message above.
So maybe some of you encountered a similar problem and is able to help me. I would be really grateful.
The error message indicates a runtime check for array accessed outside of boundary.
Try running the program with MPI *** specifying 1 process. If that runs, then I suspect that you may have a programming error where you may have partitioned the work by the number of ranks (number of processes), however, each rank attempts to iterate over the entire size of some array that was split. The ARGBLOCK_0.0.2 sounds like you are using the new Fortran BLOCK / ENDBLOCK sections for code. I'd start by looking for something wrong about those BLOCKS.
Thank you for the hint. I tried it out but the error still appeared in the same way.
What I now did was to change some code within my basic functions library. I had a module with PRIVATE variables (directive) changing its values according to the case of calculation. That means some special procedures were called to set up the private variables. So a main procedure could use this special setup to run a general routine.
Maybe the example code can somehow illustrate my intent.
MODULE my_module ... REAL*8, PRIVATE :: some_variables !~ <comprising REALs, INTEGERs, ARRAYs, ...> ... CONTAINS SUBROUTINE general_sub(return_arg) ... REAL*8, INTENT(out) :: return_arg ... !~ <do some special things with "some_variables"> return_arg = 2.d0 * some_variables ... END SUBROUTINE SUBROUTINE special_sub1(some_variables_arg1) REAL*8, INTENT(in) :: some_variables_arg1 ... some_variables = some_variables_arg1 !~ <assigning argument values to private variables, allocating-deallocating of arrays included> ... CALL general_sub(...) ... END SUBROUTINE SUBROUTINE special_sub2(some_variables_arg2) REAL*8, INTENT(in) :: some_variables_arg2 ... some_variables = some_variables_arg2 !~ <assigning argument values to private variables, allocating-deallocating of arrays included> ... CALL general_sub(...) ... END SUBROUTINE END MODULE
Now I changed it to avoiding those private variables by passing it to the main procedure (in this module) as arguments - and it works.
So maybe you have an idea why it was a problem to use private variables as a special setup? Well, if not there would not be a problem any more. But it would be interesting for me not only to figure out how I could solve but also why I could solve it this way.
You provided too little information to resolve the problem.
Using the original code (with the problem), if you can, run the executable as non-MPI program, in the debugger. When error occurs it should trap into the debugger, and then you can examine the state of the variables causing the error.
If the error does not show up in the debugger but shows up when running without the debugger, then you can use the trace back to help you to identify the section of code causing the error. It is a little harder to determine the error this way by visually inspecting the code to determine the error. In your original post you had:
Boundary Run-Time Check Failure for variable 'ARGBLOCK_0.0.2' forrtl: error (76): Abort trap signal Image PC Routine Line Source libc.so.6 00007F2BF06CC8D7 Unknown Unknown Unknown libc.so.6 00007F2BF06CDCAA Unknown Unknown Unknown geops 00000000006A863F Unknown Unknown Unknown libmodell.so 00007F2BF119E54D strukturtest_mod_ 223 strukturtest_mod.f90 libmodell.so 00007F2BF1184056 modell_start_ 169 modell_start.f90 ...
The Boundary Run-Time Check Failure is generally a subscript out of bounds error. This can be an actual occurrence of indexing an array out of bounds, or it can be something that, to the runtime system, looks like an array indexing out of bounds.
With the above dump, disregard the Source Unknown entries for the libc.so.6 lines.
geops does not have a line number nor source file. This may be a procedure from a 3rd party (static) library or a procedure you wrote, but are compiling without trace back information. This makes it harder to determine what inside geops caused the error. Due to lack of information within geops, you then look higher up (lower down in above trace back list) in the call stack to locate what the caller is passing, then deduce the error from there. From the description of the error, you are likely passing incorrect arguments. At line 223 in strukturtest_mod.f90.
From the name of the procedure "geops" I will guess that this is a routine to obtain geo-positioning information. Though I could be wrong. If it is, you may be passing a Fortran CHARACTER string (which is not NULL terminated) to a C function that expects a NULL terminated string. You may have forgotten to TRIM the trailing spaces from the Fortran string then append a null character.
This is just a guess.