- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am still involved with a legacy program that has many very old components (30 - 40 years old). We have recently migrated from g77 to Intel Fortran, and things are going well. Recently, the original author summarized a test that he ran, and mentioned that g77 and ifort behave differently when "array overruns" occur, i.e., an out-of-bounds array index is referenced. So I suggested we try catching array overruns when they occur, i.e., not wait for memory corruption, and use '-check bounds' when we run tests.
I am very impressed by how this compiler option works. In half a day, I have found about a dozen array overruns (even caught one at compile time!), and have not even begun to execute realistic scenarios. In an application this old, we will find many more, I am sure, and it will further contribute to the robustness of the program.
Just one issue: When I execute under gdb and an index out-of-bounds is discovered, the program does not break immediately into gdb. Rather, the error is reported, and gdb is entered after the program terminates.
We are using version 15, update 2, on Red Hat Enterprise Linux 6.
Here is a tiny example. I have purposely created a test case with ancient style that matches the old code.
PROGRAM TEST
INTEGER I,ARR(100)
I=0
1 I=I+1
ARR(I)=I
PRINT *,I
GOTO 1
END
The idea is that we have an array of 100 elements, and march right through the end until we are "caught". Each index is printed as we go.
First, I compile with: ifort -c -debug test.for
If I run without gdb, the program simply dies when I get to I=568. So I run with gdb, and it breaks into gdb when it detects something is wrong (too late, of course):
567
568
Program received signal SIGSEGV, Segmentation fault.
0x0000003b6de093a0 in pthread_mutex_lock () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.9.x86_64 libgcc-4.4.7-11.el6.x86_64
(gdb) bt
#0 0x0000003b6de093a0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1 0x000000000040f931 in for__aio_acquire_lun ()
#2 0x0000000000428613 in for__acquire_lun ()
#3 0x0000000000408ad9 in for_write_seq_lis ()
#4 0x0000000000402d20 in test () at test.for:6
#5 0x0000000000402c7e in main ()
(gdb) bt
#0 0x0000003b6de093a0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1 0x000000000040f931 in for__aio_acquire_lun ()
#2 0x0000000000428613 in for__acquire_lun ()
#3 0x0000000000408ad9 in for_write_seq_lis ()
#4 0x0000000000402d20 in test () at test.for:6
#5 0x0000000000402c7e in main ()
As usual, I can get a complete backtrace.
Next, I compile with: ifort -c -debug -check bounds test.for
Running without gdb, the program now correctly crashes when I get to 101:
99
100
forrtl: severe (408): fort: (2): Subscript #1 of the array ARR has value 101 which is greater than the upper bound of 100
Image PC Routine Line Source
test 0000000000404860 Unknown Unknown Unknown
test 0000000000402DAA Unknown Unknown Unknown
test 0000000000402C7E Unknown Unknown Unknown
libc.so.6 0000003B6DA1ED5D Unknown Unknown Unknown
test 0000000000402B89 Unknown Unknown Unknown
In a program this small, the location of the bug is obvious. But in a large program, just knowing the name of the array is not always enough information to find the place in the code where the overrun occurs. So I run with gdb ,and the following occurs:
99
100
forrtl: severe (408): fort: (2): Subscript #1 of the array ARR has value 101 which is greater than the upper bound of 100
Image PC Routine Line Source
test 0000000000404860 Unknown Unknown Unknown
test 0000000000402DAA Unknown Unknown Unknown
test 0000000000402C7E Unknown Unknown Unknown
libc.so.6 0000003B6DA1ED5D Unknown Unknown Unknown
test 0000000000402B89 Unknown Unknown Unknown
Program exited with code 0230.
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.9.x86_64 libgcc-4.4.7-11.el6.x86_64
(gdb) bt
No stack.
In other words, the crash is detected as before, but break in the program is too late, so the backtrace does not guide me to the line of code where the error is detected.
Jay
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Use the -traceback compiler option. When you do that, you will see source line numbers instead of machine addresses in the traceback after an array overrun or other fault.
As you have observed, waiting for a signal to be raised and cause GDB to be fired up does not let you catch the array bounds error immediately, and it is unreliable to expect a trap to be taken in a timely manner.
There is an interesting piece of history related to this. From http://courses.engr.illinois.edu/ece390/books/artofasm/CH06/CH06-5.html :
A second problem with the bound
instruction is that it executes an int 5
if the specified register is out of range. IBM, in their infinite wisdom, decided to use the int 5
interrupt handler routine to print the screen. Therefore, if you execute a bound
instruction and the value is out of range, the system will, by default, print a copy of the screen to the printer. If you replace the default int 5
handler with one of your own, pressing the PrtSc key will transfer control to your bound
instruction handler. Although there are ways around this problem, most people don't bother since the bound
instruction is so slow.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the advice to use -traceback. I take it that you are giving me a general piece of advice (much appreciated), but that this is not intended as a solution to making gdb stop at the right point when an array bounds error is encountered. I still observe the problem that I described above -- not totally unexpected, given the rest of your post.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try setting the environment variable
setenv FOR_DEBUGGER_IS_PRESENT true
and then restart the gdb debugging session.
This will cause a "break" to happen in the run-time library; you will have to go "up" a few stack frames to see your code.
--Lorri
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you. That worked. Since I use bash rather than C shell, I went with: export FOR_DEBUGGER_IS_PRESENT=true
Jay

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page