Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28493 Discussions

gdb issue when compiling with '-check bounds'

jayb
Beginner
700 Views

I am still involved with a legacy program that has many very old components (30 - 40 years old).  We have recently migrated from g77 to Intel Fortran, and things are going well.  Recently, the original author summarized a test that he ran, and mentioned that g77 and ifort behave differently when "array overruns" occur, i.e., an out-of-bounds array index is referenced.  So I suggested we try catching array overruns when they occur, i.e., not wait for memory corruption, and use '-check bounds' when we run tests.

 

I am very impressed by how this compiler option works.  In half a day, I have found about a dozen array overruns (even caught one at compile time!), and have not even begun to execute realistic scenarios.  In an application this old, we will find many more, I am sure, and it will further contribute to the robustness of the program.

Just one issue:  When I execute under gdb and an index out-of-bounds is discovered, the program does not break immediately into gdb.  Rather, the error is reported, and gdb is entered after the program terminates.

We are using version 15, update 2, on Red Hat Enterprise Linux 6.

Here is a tiny example.  I have purposely created a test case with ancient style that matches the old code.

      PROGRAM TEST
      INTEGER I,ARR(100)
      I=0
1     I=I+1
      ARR(I)=I
      PRINT *,I
      GOTO 1
      END

 

The idea is that we have an array of 100 elements, and march right through the end until we are "caught".  Each index is printed as we go.

First, I compile with: ifort -c -debug test.for

If I run without gdb, the program simply dies when I get to I=568.  So I run with gdb, and it breaks into gdb when it detects something is wrong (too late, of course):

         567
         568
Program received signal SIGSEGV, Segmentation fault.
0x0000003b6de093a0 in pthread_mutex_lock () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.9.x86_64 libgcc-4.4.7-11.el6.x86_64
(gdb) bt
#0  0x0000003b6de093a0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x000000000040f931 in for__aio_acquire_lun ()
#2  0x0000000000428613 in for__acquire_lun ()
#3  0x0000000000408ad9 in for_write_seq_lis ()
#4  0x0000000000402d20 in test () at test.for:6
#5  0x0000000000402c7e in main ()
(gdb) bt
#0  0x0000003b6de093a0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1  0x000000000040f931 in for__aio_acquire_lun ()
#2  0x0000000000428613 in for__acquire_lun ()
#3  0x0000000000408ad9 in for_write_seq_lis ()
#4  0x0000000000402d20 in test () at test.for:6
#5  0x0000000000402c7e in main ()


As usual, I can get a complete backtrace.

Next, I compile with: ifort -c -debug -check bounds test.for

Running without gdb, the program now correctly crashes when I get to 101:

          99
          100
forrtl: severe (408): fort: (2): Subscript #1 of the array ARR has value 101 which is greater than the upper bound of 100
Image              PC                Routine            Line        Source            
test               0000000000404860  Unknown               Unknown  Unknown
test               0000000000402DAA  Unknown               Unknown  Unknown
test               0000000000402C7E  Unknown               Unknown  Unknown
libc.so.6          0000003B6DA1ED5D  Unknown               Unknown  Unknown
test               0000000000402B89  Unknown               Unknown  Unknown

 
In a program this small, the location of the bug is obvious.  But in a large program, just knowing the name of the array is not always enough information to find the place in the code where the overrun occurs.  So I run with gdb ,and the following occurs:

          99
         100
forrtl: severe (408): fort: (2): Subscript #1 of the array ARR has value 101 which is greater than the upper bound of 100
Image              PC                Routine            Line        Source            
test               0000000000404860  Unknown               Unknown  Unknown
test               0000000000402DAA  Unknown               Unknown  Unknown
test               0000000000402C7E  Unknown               Unknown  Unknown
libc.so.6          0000003B6DA1ED5D  Unknown               Unknown  Unknown
test               0000000000402B89  Unknown               Unknown  Unknown
Program exited with code 0230.
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.9.x86_64 libgcc-4.4.7-11.el6.x86_64
(gdb) bt
No stack.


In other words, the crash is detected as before, but break in the program is too late, so the backtrace does not guide me to the line of code where the error is detected.

Jay 

 

 

0 Kudos
4 Replies
mecej4
Honored Contributor III
700 Views

Use the -traceback compiler option. When you do that, you will see source line numbers instead of machine addresses in the traceback after an array overrun or other fault. 

As you have observed, waiting for a signal to be raised and cause GDB to be fired up does not let you catch the array bounds error immediately, and it is unreliable to expect a trap to be taken in a timely manner.

There is an interesting piece of history related to this. From http://courses.engr.illinois.edu/ece390/books/artofasm/CH06/CH06-5.html :

A second problem with the bound instruction is that it executes an int 5 if the specified register is out of range. IBM, in their infinite wisdom, decided to use the int 5 interrupt handler routine to print the screen. Therefore, if you execute a bound instruction and the value is out of range, the system will, by default, print a copy of the screen to the printer. If you replace the default int 5 handler with one of your own, pressing the PrtSc key will transfer control to your bound instruction handler. Although there are ways around this problem, most people don't bother since the bound instruction is so slow.

 

0 Kudos
jayb
Beginner
700 Views

Thanks for the advice to use -traceback.  I take it that you are giving me a general piece of advice (much appreciated), but that this is not intended as a solution to making gdb stop at the right point when an array bounds error is encountered.  I still observe the problem that I described above -- not totally unexpected, given the rest of your post.

0 Kudos
Lorri_M_Intel
Employee
700 Views

Try setting the environment variable

setenv FOR_DEBUGGER_IS_PRESENT true

and then restart the gdb debugging session.

This will cause a "break" to happen in the run-time library; you will have to go "up" a few stack frames to see your code.

          --Lorri

 

 

0 Kudos
jayb
Beginner
700 Views

Thank you.  That worked.  Since I use bash rather than C shell, I went with:  export FOR_DEBUGGER_IS_PRESENT=true

Jay

0 Kudos
Reply