I am still involved with a legacy program that has many very old components (30 - 40 years old). We have recently migrated from g77 to Intel Fortran, and things are going well. Recently, the original author summarized a test that he ran, and mentioned that g77 and ifort behave differently when "array overruns" occur, i.e., an out-of-bounds array index is referenced. So I suggested we try catching array overruns when they occur, i.e., not wait for memory corruption, and use '-check bounds' when we run tests.
I am very impressed by how this compiler option works. In half a day, I have found about a dozen array overruns (even caught one at compile time!), and have not even begun to execute realistic scenarios. In an application this old, we will find many more, I am sure, and it will further contribute to the robustness of the program.
Just one issue: When I execute under gdb and an index out-of-bounds is discovered, the program does not break immediately into gdb. Rather, the error is reported, and gdb is entered after the program terminates.
We are using version 15, update 2, on Red Hat Enterprise Linux 6.
Here is a tiny example. I have purposely created a test case with ancient style that matches the old code.
PROGRAM TEST
INTEGER I,ARR(100)
I=0
1 I=I+1
ARR(I)=I
PRINT *,I
GOTO 1
END
The idea is that we have an array of 100 elements, and march right through the end until we are "caught". Each index is printed as we go.
First, I compile with: ifort -c -debug test.for
If I run without gdb, the program simply dies when I get to I=568. So I run with gdb, and it breaks into gdb when it detects something is wrong (too late, of course):
567
568
Program received signal SIGSEGV, Segmentation fault.
0x0000003b6de093a0 in pthread_mutex_lock () from /lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.9.x86_64 libgcc-4.4.7-11.el6.x86_64
(gdb) bt
#0 0x0000003b6de093a0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1 0x000000000040f931 in for__aio_acquire_lun ()
#2 0x0000000000428613 in for__acquire_lun ()
#3 0x0000000000408ad9 in for_write_seq_lis ()
#4 0x0000000000402d20 in test () at test.for:6
#5 0x0000000000402c7e in main ()
(gdb) bt
#0 0x0000003b6de093a0 in pthread_mutex_lock () from /lib64/libpthread.so.0
#1 0x000000000040f931 in for__aio_acquire_lun ()
#2 0x0000000000428613 in for__acquire_lun ()
#3 0x0000000000408ad9 in for_write_seq_lis ()
#4 0x0000000000402d20 in test () at test.for:6
#5 0x0000000000402c7e in main ()
As usual, I can get a complete backtrace.
Next, I compile with: ifort -c -debug -check bounds test.for
Running without gdb, the program now correctly crashes when I get to 101:
99
100
forrtl: severe (408): fort: (2): Subscript #1 of the array ARR has value 101 which is greater than the upper bound of 100
Image PC Routine Line Source
test 0000000000404860 Unknown Unknown Unknown
test 0000000000402DAA Unknown Unknown Unknown
test 0000000000402C7E Unknown Unknown Unknown
libc.so.6 0000003B6DA1ED5D Unknown Unknown Unknown
test 0000000000402B89 Unknown Unknown Unknown
In a program this small, the location of the bug is obvious. But in a large program, just knowing the name of the array is not always enough information to find the place in the code where the overrun occurs. So I run with gdb ,and the following occurs:
99
100
forrtl: severe (408): fort: (2): Subscript #1 of the array ARR has value 101 which is greater than the upper bound of 100
Image PC Routine Line Source
test 0000000000404860 Unknown Unknown Unknown
test 0000000000402DAA Unknown Unknown Unknown
test 0000000000402C7E Unknown Unknown Unknown
libc.so.6 0000003B6DA1ED5D Unknown Unknown Unknown
test 0000000000402B89 Unknown Unknown Unknown
Program exited with code 0230.
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.149.el6_6.9.x86_64 libgcc-4.4.7-11.el6.x86_64
(gdb) bt
No stack.
In other words, the crash is detected as before, but break in the program is too late, so the backtrace does not guide me to the line of code where the error is detected.
Jay