Solved: SIGSEGV only with new Fortran Compiler (14.0.1) build date 20131008 (Composer XE 2013 SP1)

Martin_D_1 · ‎02-06-2014

Dear All,

I have severe problems running my application when compiling it with the newest release of the Intel Fortran Compiler (14.0.1), i.e. I'm getting a segmentation fault in a rather simple early part of the code.

I know http://software.intel.com/en-us/articles/determining-root-cause-of-sigsegv-or-sigbus-errors and since it is an openMP application, I set the ulimit on my shell rather than using the -heap-arrays option. The application runs successfully when using version 12.1.2 of ifort and various versions of the GNU Fortran compiler.

I turned on all warning and traceback options I'm aware of, the compilation commant looks like:

-openmp -openmp-report0 -parallel -fpp -ftz -assume byterecl -diag-enable sc3 -diag-disable 5268 -warn declarations -warn general -warn usage -warn interfaces -warn ignore_loc -warn alignments -warn unused -g -traceback -gen-interfaces -fp-stack-check -check bounds,format,output_conversion,pointers,uninit -fpe-all0 -debug-parameters all -stand f08 -standard-semantics -O0 -no-ip -I../lib -I/opt/intel/composer_xe_2013_sp1.1.106/mkl/include -real-size 64 -integer-size 32 -DFLOAT=8 -DINT=4 -DSpectral -c DAMASK_spectral_solverBasic.f90

where the macros DINT and DFLOAT set the precision of the default integers (via preprocessor) to 8 and 4, respectively.

For GNU Fortran, it also runs with very strict settings. Even though I can't promise that there are no errors in the code, it indicates strongly that it is ok

My problem seem similar to the one described in http://software.intel.com/en-us/forums/topic/488495.

It's running on a Xeon CPU with 2x6 cores, 1.93 Ghz and 24 MB of RAM and is linked agains FFTW (from the Ubuntu repositories) and MKL. Since it is a large part of code, I dont't want to post it here but I could provide a zipped archive if someone wants to have a look

With openMP on, I throws the following message, if it is set off it dies silently

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source
DAMASK_spectral    00000000004BF279 Unknown               Unknown Unknown
DAMASK_spectral    00000000004BDBF0 Unknown               Unknown Unknown
DAMASK_spectral    0000000000479322 Unknown               Unknown Unknown
DAMASK_spectral    0000000000417D98 Unknown               Unknown Unknown
DAMASK_spectral    000000000041D2AB Unknown               Unknown Unknown
libpthread.so.0    00002AAAAB260CB0 Unknown               Unknown Unknown
DAMASK_spectral    0000000000465421 Unknown               Unknown Unknown
DAMASK_spectral    000000000069B13C Unknown               Unknown Unknown
DAMASK_spectral    000000000071B3F6 Unknown               Unknown Unknown
DAMASK_spectral    000000000086F533 Unknown               Unknown Unknown
DAMASK_spectral    000000000088042F Unknown               Unknown Unknown
DAMASK_spectral    00000000004C9465 MAIN__                    155 DAMASK_spectral_driver.f90
DAMASK_spectral    0000000000404626 Unknown               Unknown Unknown
libc.so.6          00002AAAAE86676D Unknown               Unknown Unknown
DAMASK_spectral    0000000000404519 Unknown               Unknown Unknown

I appreciate any help or hints how to make it run again

Martin

Steven_L_Intel1 · ‎02-06-2014

I have escalated this as issue DPD200253121. I observed a couple of things. One, the program behaves badly if you don't specify the mandatory command line arguments. I don't think this is a compiler bug. When I see a program that fails with -openmp I look to see if just -recursive will show the problem, but that's not the case here.

View solution in original post

mecej4 · ‎02-06-2014

I dont't want to post it here but I could provide a zipped archive if someone wants to have a look

If the source codes put together are not over a few megabytes in size, I'll take a look, assuming that the code crashes even on a single processor and that 8 GB of RAM is sufficient to run the code (if not, please provide a scaled down version if you can do it without much effort).

Martin_D_1 · ‎02-06-2014

Dear mecej4,

thanks for your offer, I really appreciate your help. In case of compiling without openMP, I could locate the error. It's crashing in the following routine (IO_intValue), just before/during IO_verifyIntValue (further down) is called. I've attached a minimal example. To compile it, one might need to set the correct path to mkl and FFTW in damask.conf and run it with ./test.exe -l dd -g aa.

It should work without any other modifications, but please ask in case of questions. It should need only a feq MByte of RAM at most.

[fortran] !-------------------------------------------------------------------------------------------------- !> @brief reads integer value at myPos from string !-------------------------------------------------------------------------------------------------- integer(pInt) function IO_intValue(string,ends,myPos) implicit none character(len=*), intent(in) :: string !< raw input with known ends integer(pInt), intent(in) :: myPos !< position of desired sub string integer(pInt), dimension(:), intent(in) :: ends !< positions of ends in string character(len=13), parameter :: MYNAME = 'IO_intValue: ' character(len=12), parameter :: VALIDCHARACTERS = '0123456789+-' ! debugging start print*, 'size of ends:', size(ends) print*, 'myPos:', myPos print*, 'length of string:', len(string) print*, 'substring: ', string(ends(myPos*2):ends(myPos*2+1)) ! debugging end IO_intValue = 0_pInt if (myPos > ends(1) .or. myPos < 1_pInt) then ! trying to access non-present value call IO_warning(201,el=myPos,ext_msg=MYNAME//trim(string)) else IO_intValue = IO_verifyIntValue(string(ends(myPos*2):ends(myPos*2+1)),& VALIDCHARACTERS,MYNAME) endif end function IO_intValue !-------------------------------------------------------------------------------------------------- !> @brief returns verified integer value in given string !-------------------------------------------------------------------------------------------------- integer(pInt) function IO_verifyIntValue (string,validChars,myName) implicit none character(len=*), intent(in) :: string, & !< string for conversion to float value validChars, & !< valid characters in string myName !< name of caller function (for debugging) integer(pInt) :: readStatus, invalidWhere character(len=len(trim(adjustl(string)))) :: trimmed write(6,*) 'does not reach here'; flush(6) trimmed = trim(adjustl(string)) IO_verifyIntValue = 0_pInt invalidWhere = verify(trimmed,validChars) print*, invalidWhere; flush(6) if (invalidWhere == 0_pInt) then read(UNIT=trimmed,iostat=readStatus,FMT=*) IO_verifyIntValue ! no offending chars found if (readStatus /= 0_pInt) & ! error during string to float conversion call IO_warning(203,ext_msg=myName//'"'//trimmed//'"') else call IO_warning(202,ext_msg=myName//'"'//trimmed//'"') ! complain about offending characters read(UNIT=trimmed(1_pInt:invalidWhere-1_pInt),iostat=readStatus,FMT=*) IO_verifyIntValue ! interpret remaining string if (readStatus /= 0_pInt) & ! error during string to float conversion call IO_warning(203,ext_msg=myName//'"'//trimmed(1_pInt:invalidWhere-1_pInt)//'"') endif end function IO_verifyIntValue [/fortran]

Martin_D_1 · ‎02-06-2014

I found the bad statement:

[fortran]character(len=len(trim(adjustl(string)))) :: trimmed[/fortran]

is causing the trouble

[fortran]character(len=len(trim(string))) :: trimmed[/fortran]

works, however it would be intersting to know if there is something wrong with the statement and if it is not allowed, the compiler should complain during compilation

Steven_L_Intel1 · ‎02-06-2014

It's legal code, but I know we've had issues in the past with certain intrinsic functions in character length expressions. I took a look at your code but I don't see the definition of InputFileExtension and LogFileExtension anywhere in the sources.

Martin_D_1 · ‎02-06-2014

Thanks for your response,

InputFileExtension and LogFileExtension are not used because the precompiler comments them out. However, I further stripped down the code, now ./test.exe should run without arguments.

best regards

Martin

Steven_L_Intel1 · ‎02-06-2014

Thanks - got it. I'll let you know what we find.

Steven_L_Intel1 · ‎02-06-2014

I have escalated this as issue DPD200253121. I observed a couple of things. One, the program behaves badly if you don't specify the mandatory command line arguments. I don't think this is a compiler bug. When I see a program that fails with -openmp I look to see if just -recursive will show the problem, but that's not the case here.

Martin_D_1 · ‎02-06-2014

Dear Steve,

thanks for the quick response. The fact, that my code isn't running without the mandatory arguments -l and -g is due to that it is taken from a larger piece of code. That should be solved in the second archive I've attached in the later post.

Still, my original software is crashing when I compile without -openmp, but in a later part of the code. I'll try to locate this in the next days and provide a minimal example if it's not an (obvious) programming bug and I can reproduce it. Currently, we fixed the first issue by just apply trim(adjustl()) to the string we're using as an actual argument to IO_verifyIntValue, hence we can use the string dummy argument directly and don't need to use the trim(adjustl()) on a copy to work with. That's not a big issue because it only affects a few lines of the code

thanks again for your help

Martin

jimdempseyatthecove · ‎02-07-2014

I would suggest NOT using unlimited stack size for multi-threaded program. The recommendation in the article for using it for OpenMP programs (Cause #2) is wrong. Look at Cause #2-prime:

the realistic amount of memory that can be consumed has a ceiling at PHYSICAL ram + Swap space (typically 2x the physical memory size) ...They system typically needs some space, so a rule of thumb is to keep memory footprint of your application to around 80% of MemTotal if possible and never exceed MemTotal + SwapTotal.

Using the above metrics, this estimates to 2 x 8GB * 0.8 = 12.8GB for your system. For simplicity your code and static data is 0.8GB. This leaves 12GB for heap and stack(s). What does unlimited mead in this context?

1) program loads, and just entered 6GB of address reserved for heap, 6GB for main thread's stack
2) Second thread (of 11 additional threads) starts. It shares the heap, where does it get its stack. It steals it from the first thread. Now you have 6GB heap, 3GB main, 3GB second thread.
3) Third thread starts. 6GB heap shared, *** it cannot steal from main thread because second thread has fixed offset from main thread. Therefore, it could take half of 3GB of second thread. Second and third are now 1.5GB
4) Forth thread starts, split of third yields 3rd and 4th having 0.75GB
...

A better strategy, is to explicitly specify a stack size (and additionally a heap size if possible).

On the 64-bit system, the virtual addresses could be split and mapped, as opposed to the above using virtual addresses that correlate to the process size limitation. O/S writers tend not to do this. Because using the smaller set, the virtual address becomes an index into the page file.

Jim Dempsey

mecej4 · ‎02-10-2014

it's a call to an FFTW routine

There is no call to any FFTW routine in the source files included in ifort14crash.tar_3.gz. On Windows, the 32 bit IFort 14.0.1.139 compiler built and ran the example program with no run time errors.

Martin_D_1 · ‎02-10-2014

I found the second part of code causing trouble, unfortunately it's not that clear to me what's going wrong since it's a call to an FFTW routine

[fortran]write(6,*) 'Runs up to here'
flush(6)
call fftw_execute_dft_r2c(planForth, F_real, F_fourier)
write(6,*) 'But I cannot see this'
flush(6)[/fortran]

embedded in a rahter complex subroutine.

I've also attache a minimal example, after unpacking do

[bash]make

./test.exe[/bash]

again, it runs without flaws using ifort 12.1.2, 13.0.1 and with various gfortran versions. The FFTW libs included are compiled with icc 14.0.1, but it does'nt work with the ubuntu 12.04 shipped version either.

Martin_D_1 · ‎02-10-2014

Dear mecej4,

sorry for the inconvenience, I've uploaded the wrong archive and just updated it.

Martin_D_1 · ‎02-10-2014

jimdempseyatthecove wrote:

A better strategy, is to explicitly specify a stack size (and additionally a heap size if possible).

On the 64-bit system, the virtual addresses could be split and mapped, as opposed to the above using virtual addresses that correlate to the process size limitation. O/S writers tend not to do this. Because using the smaller set, the virtual address becomes an index into the page file.

Jim Dempsey

Do you have any recommendations how to set it? For example, would it make sense to set the heap to half of the total memory and set the stack to half of the main memory divided by the number of threads?

jimdempseyatthecove · ‎02-10-2014

Martin,

You could choose 50:50/N if you wish, but I would suggest writing your program to use allocatable arrays for the larger arrays. Keep in mind that the overhead for allocation is relatively fixed, and the memory access time may grow:

Linear (number of elements)
square (number of elements**2)
cubed (number of elements**3)
...

At some point the time for allocation is negligible.

You will need to use kmp_set_stacksize_s() or environment variable KMP_STACKSIZE or OMP_STACKSIZE. The stack size has to be set prior to establishing the OpenMP thread pool. The kmp_... functions are Intel specific.

I haven't tested this, you are welcome to try this yourself, there is the SETENVQQ(varname=value) library function. Prior to entering your first parallel region you could try setting the KMP or OMP _STACKSIZE variable. *** you cannot use omp_get_num_threads() to get the number of threads (as your init code is not in a parallel region). Use omp_get_max_threads() or omp_get_thread_limit().

Blindly picking 50:50 means you do not have a full understanding of the data requirements of your application. Please take some time to perform an analysis.

Jim Dempsey

mecej4 · ‎02-25-2014

Martin D.: I downloaded the corrected files from your post #6 today, and built the application using the current 14.0.2 64-bit compiler on OpenSuse-64, using your makefile. The application ran to completion without any errors. Perhaps you used a different version of FFTW than the one that I used (the one visible in the package manager of OpenSuse, probably built with GCC)?

The output was:

[bash]  Restore geometry using FFT-based integration
grid     a b c:           16          16          16
size     x y z: 1.00000E+00 1.00000E+00 1.00000E+00
Runs up to here
But I cannot see this[/bash]

Martin_D_1 · ‎02-26-2014

Hi meje4,

I currently don't have the new 14.0.2 compiler, but I would assume that the reason for crashing was the old, 14.0.1, compiler. I'll ask our administrators to install the new version and will let you know if it's wokring.

Martin_D_1 · ‎03-25-2014

it seems to work with the new compiler and the minimal example as well as my original code is working fine

Steven_L_Intel1 · ‎04-15-2014

The problem revealed by the original post has been fixed for a release later this year.