Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Segfault without debug options

Tim_Gallagher
New Contributor II
2,326 Views

Hi,

 

I have a code that interpolates a dataset from one grid onto another. However, it shows some strange behavior when compiled with an Intel compiler (this works without problem on IBM). When run with no compiler options (so defaulting to -O2), it segfaults. Turning on -traceback will show the line where it crashes, but it still crashes. However, when -g is enabled, it no longer crashes and runs just fine (albeit slow).

I tried a few different combinations:

-g -O2: Crash in the same place

-O0: Crash in the same place

-O2 -fno-omit-frame-pointer: Crash in the same place

Attached is a sample dataset and the code. The shell script, gridInterp.sh, will compile the executable if it doesn't already exist, change the stack limit of the terminal, and then run the code. So, you would have to remove the executable if you change options and need it to recompile.

The inputs are (in order): 1, 27, 500, -3, 4, 5.

If anybody has any insight into why -g makes the code runnable, I'd appeciate it. Thanks,

 

Tim

0 Kudos
11 Replies
TimP
Honored Contributor III
2,326 Views

Have you checked into the diagnostic messages?

grid_interp.f90(104): error #12306: unvalidated value is received from call to
n external function at (file:grid_interp.f90 line:47) which can be used in loop
condition expression
grid_interp.f90(104): error #12306: unvalidated value is received from call to
n external function at (file:grid_interp.f90 line:88) which can be used in loop
condition expression
grid_interp.f90(107): error #12306: unvalidated value is received from call to
n external function at (file:grid_interp.f90 line:88) which can be used in loop
condition expression
grid_interp.f90(238): error #12306: unvalidated value is received from call to
n external function at (file:grid_interp.f90 line:221) which can be used in loo
condition expression
grid_interp.f90(241): error #12306: unvalidated value is received from call to
n external function at (file:grid_interp.f90 line:221) which can be used in loo
condition expression
grid_interp.f90(270): error #12306: unvalidated value is received from call to
n external function at (file:grid_interp.f90 line:53) which can be used in loop
condition expression
grid_interp.f90(294): error #12145: function "GI_SEARCHTREE" is called as subro
tine

If you do in fact mix function and subroutine call of the same name, you may get a data over-run which might accidentally be OK when there is more symbol information spacing out the more critical data.

0 Kudos
Tim_Gallagher
New Contributor II
2,326 Views

Thanks for looking at it. How did you generate those diagnostic messages?

The first batch about the unvalidated value -- those are user input or read from a file. I'm not sure what the compiler means there, but the numbers read in are okay for loop variables (unless somebody types in junk).

The last one, about the function called as a subroutine I am really confused about. The line in question is:

currentNode => GI_SearchTree(currentNode, pX, pY, pZ)

where

FUNCTION GI_SearchTree(startNode, X, Y, Z) RESULT(contNode)
USE TREE

IMPLICIT NONE

TYPE(NODE), POINTER :: contNode, startNode
REAL*8, INTENT(IN) :: X, Y, Z

And there is no place that GI_SearchTree is declared as a SUBROUTINE, nor do I CALL it.

Any ideas?

Tim

0 Kudos
Steven_L_Intel1
Employee
2,326 Views
Tim used the Source Checker option (-diag-enable sc). Unfortunately, it tends to get very confused by Fortran code (it's better with C/C++.) I like to try it when I can't otherwise understand a behavior, but it's more often than not less than helpful. I suspect that in this case it doesn't understand the way the Fortran compiler implements functions returning pointers, since to the optimzer it "looks like" a subroutine call with the function return variable a hidden argument.
0 Kudos
TimP
Honored Contributor III
2,326 Views

I suppose the warnings are simply warnings, indicating that the program is vulnerable to bad input.

If there is not actually a mixture of subroutine and function calling, then there is a bug in the compiler diagnostics invoked by 'ifort -diag-enable sc'. I have seen this before; I would certainly like to see it fixed, so as to avoid doubts about whether the compiler is working correctly on the example.

It's also valuable to run a build made with -check enabled, if you haven't done so.

0 Kudos
Tim_Gallagher
New Contributor II
2,326 Views

It's been awhile since I've run with -check on this code, but I just did it again and it runs to completion with no crash, which is the same as when run with -g.

Tim

0 Kudos
Ron_Green
Moderator
2,326 Views

this is what I see, is this what you see when you run it also?

$ ./gi.x
Enter the number of blocks in the old grid...
1
Enter the number of blocks in the new grid...
27
Enter the maximum number of grid points for each node...
500
Enter NLG...
-3
Enter NRG...
4
Enter the number of variables in the restart file...
5
Reading the old grid...
DOM: 1 IMAX = 129 JMAX = 129 KMAX = 129
There are 2097152 grid points in the old grid
Must allocate 4194306 data spots
Attaching data to the root
Minimum coordinates are: -4.762500000000000E-003 -4.762500000000000E-003
-4.762500000000000E-003
Maximum coordinates are: 0.157162500000000 0.157162500000000
0.157162500000000
Attached minimum coordinate data at 1
Attached maximum coordinate data at 2
FINISHED WITH BLOCK 1
Building tree from the dataset
Done building tree
Reading new grid...
DOM = 1 IMAX = 129 JMAX = 11 KMAX = 12
DOM = 2 IMAX = 129 JMAX = 11 KMAX = 12
DOM = 3 IMAX = 129 JMAX = 11 KMAX = 13
DOM = 4 IMAX = 129 JMAX = 11 KMAX = 12
DOM = 5 IMAX = 129 JMAX = 11 KMAX = 12
DOM = 6 IMAX = 129 JMAX = 11 KMAX = 13
DOM = 7 IMAX = 129 JMAX = 11 KMAX = 12
DOM = 8 IMAX = 129 JMAX = 11 KMAX = 12
DOM = 9 IMAX = 129 JMAX = 11 KMAX = 13
DOM = 10 IMAX = 129 JMAX = 21 KMAX = 12
DOM = 11 IMAX = 129 JMAX = 11 KMAX = 12
DOM = 12 IMAX = 129 JMAX = 21 KMAX = 12
DOM = 13 IMAX = 129 JMAX = 11 KMAX = 12
DOM = 14 IMAX = 129 JMAX = 21 KMAX = 13
DOM = 15 IMAX = 129 JMAX = 11 KMAX = 13
DOM = 16 IMAX = 129 JMAX = 14 KMAX = 12
DOM = 17 IMAX = 129 JMAX = 12 KMAX = 12
DOM = 18 IMAX = 129 JMAX = 14 KMAX = 12
DOM = 19 IMAX = 129 JMAX = 12 KMAX = 12
DOM = 20 IMAX = 129 JMAX = 14 KMAX = 13
DOM = 21 IMAX = 129 JMAX = 12 KMAX = 13
DOM = 22 IMAX = 129 JMAX = 12 KMAX = 12
DOM = 23 IMAX = 129 JMAX = 14 KMAX = 12
DOM = 24 IMAX = 129 JMAX = 12 KMAX = 12
DOM = 25 IMAX = 129 JMAX = 14 KMAX = 12
DOM = 26 IMAX = 129 JMAX = 12 KMAX = 13
DOM = 27 IMAX = 129 JMAX = 14 KMAX = 13
Reading the old restart file...
Searching for nearest points and interpolating
Starting search and interpolation for new block 1
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
gi.x 000000000040A859 gi_data_mp_gi_sea 274 gi_data_mod.f90
gi.x 00000000004126F5 MAIN__ 294 grid_interp.f90
gi.x 0000000000402B1C Unknown Unknown Unknown
libc.so.6 0000003D0FC1D8A4 Unknown Unknown Unknown
gi.x 0000000000402A29 Unknown Unknown Unknown

0 Kudos
Tim_Gallagher
New Contributor II
2,326 Views

Correct. It crashes when it tries to search for the first point for the first time, unless run with -g or -C.

Tim

0 Kudos
Ron_Green
Moderator
2,326 Views

Tim,


I'm still narrowing this down, but it's definitely a compiler bug (no surprise, huh?). AND it only appears in 11.1.046 and newer compilers. If you use any compiler older than this it will run fine.

I've narrowed it down to tree_mod.f90. If you compile this with:

-no-vec

and compile everything else at -O2 and link, it'll run fast and fine.

I believe the compiler is getting overly agressive with tree_mod.f90 lines 286, 306, and 338. It seems to be vectorizing the NULLIFY statements in here, which it shouldn't because there is a dependency with the temp pointer assignment prior to each nullify. I am not sure if the nullify near line 475 is also being munged.

After I narrow this down a bit more I'll get a bug report started.

ron

0 Kudos
Tim_Gallagher
New Contributor II
2,326 Views

Thanks! Sorry I seem to find bugs a lot... I don't know how I keep coming up with strange/obscure programs.

Compiling tree_mod with -no-vec worked just fine as you suggested.

On a whim, I tried putting !DEC$ NOVECTOR in front of the three loops you mentioned and I compiled and ran it with -O2 on tree_mod.f90. It still crashes. Using -vec-report3, it doesn't vectorize line 475 because it assumes a vector dependence, and it didn't vectorize the three loops with NOVECTOR.

So I guess there's still something else going on somewhere. But you already knew that!

Thanks again,

Tim

0 Kudos
Ron_Green
Moderator
2,326 Views

DEC$ NOVECTOR - yes, I found that earlier this morning too. I'm going to have to dig in further to see what's going on with the optimization of these loops. There must be some loop transformations going on that is violating the ordering of the statements in the loop. Often the optimizer will do some loop transforms in order to get a non-vectorized loop to vectorize. I'm also not discounting the possibility that the NULLIFY library call itself may be at fault.

ron

0 Kudos
Ron_Green
Moderator
2,326 Views

bug report is DPD200150928

ron

0 Kudos
Reply