Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
29282 Discussions

Version and optimization dependent segfault

Van_Veen__Lennaert
1,412 Views

I have a segfault I would appreciate some help with. A nearly minimal code that reproduces it is attached.
The background is, that I am developing a code that handles big matrices, which should be distributed over CPUs along one index (labeled z in the example). I want to determine the distribution during run time, based on the number of prcosses as returned by an MPI routine. The way I have set it up it to have a module "global", that all other modules use, with some auxiliary variables related to the partitioning in it. In the main program I then obtain the number of processes and allocate these variables (in the example code only ny and nz, integers that appear in loop bounds, and kz, an allocatable array). Note that I have removed all MPI-related code from the example, setting nprocs and myrank by a simple assignment.

When I compile the attached code on our small cluster, running Linux version 2.6.18-164.11.1.el5 (Red Hat 4.1.2-46) and ifort version 11.1, I find that
* with optimization -O1 and -O2 the code runs and terminates cleanly;
* with optimization -O3 I get:
> ifort -O3 -traceback -o test.x DNS_int.f90
> ./test.x

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
test.x             0000000000403062  hit3d_mp_rhs3_             44  hit3d.f90
test.x             0000000000402EC0  hit3d_mp_rhs_              22  hit3d.f90
test.x             0000000000402C53  MAIN__                     33  DNS_int.f90
test.x             0000000000402ACC  Unknown               Unknown  Unknown
libc.so.6          0000003A9B01D994  Unknown               Unknown  Unknown
test.x             00000000004029D9  Unknown               Unknown  Unknown

It would seem that the root cause is the way that kz is handled. If I declare it just like kx and ky, rather than dynamically, the segfault disappears. That would not be a solution, though, as I need to allocate it dynamically.
My questions:
1) Is the construction I use correct? If not, please suggest a correct way to do this (to allocate kz based on a value of nprocs determined during runtime).
2) If it is correct, then is this a compiler bug? Is there a work-around that keeps my code portable and the executable near-optimal?

Two more observations that may be relevant:
When I add compiler flags sometimes the segfault goes away. For instance, combining -O3 with any of the following: -check pointers, -check bounds, -check uninit, -no-vec makes the segfault disappear,
When I compile the code on my laptop, running Linux version 3.11.0-26-generic (Ubuntu 13.10) with ifort 12.1.0, there is no segfault at all at any optimization level.

Any help with this would be greatly appreciated.
 

0 Kudos
9 Replies
pbkenned1
Employee
1,412 Views

I haven't studied the code, but possibly this is a bug in the 11.1 compiler.  The 15.0 compiler has no issue:

$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.090 Build 20140723
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

$  ifort -O3 -traceback -o test.x DNS_int.f90
$ ./test.x
$

Patrick

0 Kudos
Van_Veen__Lennaert
1,412 Views

It would be great if a specialist could confirm that this is an ifort bug. I just want to make sure it is not a mistake in my code.
Also, since I cannot change the version of ifort on the cluster, a safe work-around for version 11.1 would be very helpful.
Thanks for the check.

0 Kudos
pbkenned1
Employee
1,412 Views

What exact version of ifort 11.1 are you using (ie, the output of ifort -V)?  I did try the last 11.1 version, and your example worked normally.  I'd be happy to determine if this is really an ifort bug or not, but I need to be able to reproduce the SEGV.

Patrick

0 Kudos
jimdempseyatthecove
Honored Contributor III
1,412 Views

Not that this matters with nprocs=1 but...

complex(kind=8), dimension(0:n/2,0:n-1,0:n-1) :: A,B,FA,FB
...
 subroutine RHS(A,B,RA,RB)
   complex(kind=8), intent(in),  dimension(0:n/2,0:n-1,0:nz-1) :: A,B
   complex(kind=8), intent(out), dimension(0:n/2,0:n-1,0:nz-1) :: RA,RB

Jim Dempsey

0 Kudos
pbkenned1
Employee
1,412 Views

I don't spot any coding errors.  I think this is just an -O3 optimization bug in 11.1, since it works at -O2 with that version, or at -O3 with any other major compiler version I tested (11.1.080, 12.1.7.367, 13.1.3.192, 14.0.4.211, 15.0.0.090).

Patrick

0 Kudos
Van_Veen__Lennaert
1,412 Views

The output of ifort -V:

Intel(R) Fortran Intel(R) 64 Compiler Professional for applications running on Intel(R) 64, Version 11.1    Build 20091130 Package ID: l_cprof_p_11.1.064
Copyright (C) 1985-2009 Intel Corporation.  All rights reserved.


And /proc/version reads:

Linux version 2.6.18-164.11.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)) #1 SMP Wed Jan 20 07:32:21 EST 2010


As for Jim Dempsey's comment: in the actual program these arrays do not occur in the main program, but I cut out several layers to narrow down the possible causes. The segfaults stays if I use n instead of nz to set the dimensions in subroutines RHS and RHS3. I suppose that means that kz is the root cause, not the array bounds. Thanks!

0 Kudos
pbkenned1
Employee
1,412 Views

It's an -O3 unroll/jam defect in ifort-11.1.064.  You can workaround it with -unroll0:

[U533981]$ ifort -V
Intel(R) Fortran Intel(R) 64 Compiler Professional for applications running on Intel(R) 64, Version 11.1    Build 20091130 Package ID: l_cprof_p_11.1.064
Copyright (C) 1985-2009 Intel Corporation.  All rights reserved.

[U533981]$  ifort -O3 -traceback -o test.x DNS_int.f90 -unroll0
[U533981]$ ./test.x
[U533981]$

 

Patrick

0 Kudos
Van_Veen__Lennaert
1,412 Views

Thank you very much for sorting this out! I can move on with the project now, and I do not think the unrolling will impact significantly on the run time.

0 Kudos
pbkenned1
Employee
1,412 Views

Thanks for the feedback, I'll consider this case closed then.  I'll note in closing that -unroll0 only needs to be applied to hit3d.f90.  You had included the file in DNS_int.f90.  I commented out the include, and compiled hit3d separately to debug the issue.  Of course, the unroll issue arises in the code generated for RA(kx_,ky_,kz_)=RA(kx_,ky_,kz_)-kz(kz_)*kx(kx_)*UU(kx_,ky_,kz_).  As long as that statement is not a hotspot for your real application, the performance hit from applying -unroll0 probably won't be noticed.

Patrick

0 Kudos
Reply