Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
29235 Discussions

openmp crash using Itanium2 with ifort

Steve_Cousins
Beginner
1,190 Views
I'm trying to get an ocean model to run on a SGI Altix 3700 BX2 system running Suse Linux. I have installed versions 10 and 11 of ifort as well as gfortran 4.3.3. The problem is that the program crashes with the following backtrace when I compile it with either version of the ifort compiler:

(idb) bt
#0 0xa000000000010621 in __kernel_syscall_via_break () in [vdso]
#1 0x20000000005901c0 in raise () in /lib/libc-2.4.so
#2 0x2000000000592b60 in abort () in /lib/libc-2.4.so
#3 0x2000000000481300 in __kmp_do_abort () in /opt/intel/Compiler/11.0/083/lib/ia64/libiomp5.so
#4 0x2000000000473930 in __kmp_wait_sleep () in /opt/intel/Compiler/11.0/083/lib/ia64/libiomp5.so
#5 0x2000000000472f70 in __kmp_barrier () in /opt/intel/Compiler/11.0/083/lib/ia64/libiomp5.so
#6 0x200000000044ac30 in __kmpc_barrier () in /opt/intel/Compiler/11.0/083/lib/ia64/libiomp5.so
#7 0x4000000000008a30 in step_thread () at /home/cousins/PWS-altix/temp/step.f:369
#8 0x4000000000008560 in step () at /home/cousins/PWS-altix/temp/step.f:3

I am currently compiling this with:

ifort -c -O3 -w90 -w95 -cm -72 -fno-alias -i4 -r8 -Vaxlib -heap-arrays -auto -noalign -openmp -nothreads

Everything from -heap-arrays on is just based on Googling. I have also done:

export KMP_STACKSIZE=209715200
ulimit -s unlimited
export OMP_NUM_THREADS=16

before running the program. I have also tried -O0 and different numbers of threads from 4 to 64. The code does run in serial mode without OpenMP.

The code runs fine with the gfortran compiler and it also runs fine with both ifort 10 and 11 on Xeon and Opteron machines running Fedora.

Is there a known problem with ifort and OpenMP on the Itanium2? Something in the Suse environment? Some added switch that I should be using for compilation?

Thanks,

Steve

0 Kudos
9 Replies
TimP
Honored Contributor III
1,190 Views
It's been said that -heap-arrays may be incompatible with -openmp. You could try "-heap-arrays n", if you can avoid any allocations less than n in the parallel regions. -nothreads and -noalign don't make sense to me, particularly if you have no reason to use them other than some Googled recommendation of unknown heritage.
You have several options there which ought to be redundant; I'm a little more comfortable in leaving them off.
-auto might be OK as it is implied by -openmp, and would assure you of similar treatment of local arrays when not setting -openmp.
0 Kudos
Steve_Cousins
Beginner
1,190 Views
Quoting - tim18
It's been said that -heap-arrays may be incompatible with -openmp. You could try "-heap-arrays n", if you can avoid any allocations less than n in the parallel regions. -nothreads and -noalign don't make sense to me, particularly if you have no reason to use them other than some Googled recommendation of unknown heritage.
You have several options there which ought to be redundant; I'm a little more comfortable in leaving them off.
-auto might be OK as it is implied by -openmp, and would assure you of similar treatment of local arrays when not setting -openmp.

Hi Tim,

The original compile line was:

ifort -c -O3 -w90 -w95 -cm -72 -fno-alias -i4 -r8 -Vaxlib

I probably should have just used that in the email since that is what works on the other machines.
0 Kudos
TimP
Honored Contributor III
1,190 Views
Quoting - Steve Cousins
The original compile line was:

ifort -c -O3 -w90 -w95 -cm -72 -fno-alias -i4 -r8 -Vaxlib

Even the obsolete options there should be OK, provided that you use -r8 consistently, or turn on interface checking so as to ensure consistency.
0 Kudos
Steve_Cousins
Beginner
1,190 Views
Quoting - tim18
Even the obsolete options there should be OK, provided that you use -r8 consistently, or turn on interface checking so as to ensure consistency.

Just a correction to what I said before about what switches are being used. -openmp is also being used (of course).

As for -r8 being used consistently, it is. The program is created with a script that prepares everything and then runs "make". The compile flags are supplied in only two places (one for modules and one for other files) and these are consistent. As indicated before, this program compiles and runs fine on other architectures (Xeon and Opteron) with the ifort compiler and it even runs on the Itanium2 with the gfortran compiler.

What I am wondering about is if there are any known problems (and fixes hopefully) with the ifort compiler on Itanium2 systems when using OpenMP.

Thanks,

Steve
0 Kudos
eliosh
Beginner
1,190 Views
Quoting - Steve Cousins

Just a correction to what I said before about what switches are being used. -openmp is also being used (of course).

As for -r8 being used consistently, it is. The program is created with a script that prepares everything and then runs "make". The compile flags are supplied in only two places (one for modules and one for other files) and these are consistent. As indicated before, this program compiles and runs fine on other architectures (Xeon and Opteron) with the ifort compiler and it even runs on the Itanium2 with the gfortran compiler.

What I am wondering about is if there are any known problems (and fixes hopefully) with the ifort compiler on Itanium2 systems when using OpenMP.

Thanks,

Steve
From my (very limited) experience OpenMP programs crash because they try to allocate arrays on stack that is not large enough. This situation is easily identified: the subroutine crashes the program even before the first executable statement is run. There are two solutions. First, set a bigger stack. Second (seems much better to me) use allocatable arrays.

0 Kudos
Steve_Cousins
Beginner
1,190 Views
Quoting - eliosh
From my (very limited) experience OpenMP programs crash because they try to allocate arrays on stack that is not large enough. This situation is easily identified: the subroutine crashes the program even before the first executable statement is run. There are two solutions. First, set a bigger stack. Second (seems much better to me) use allocatable arrays.


Thanks for the suggestion. I'm specifying:

export KMP_STACKSIZE=209715200
ulimit -s unlimited

Is there something else that I should specify do you think? Incidently, on the Opteron and Xeon systems (both Fedora) the ulimit stack size is

stack size (kbytes, -s) 10240

The default on the Suse Altix machine is 8192 but I thought setting it to "unlimited" would make it a non-issue. I just gave it a try just specifying it to 10240 like the others.

Using allocatable arrays isn't out of the question but this code has been around for quite a while with a number of users and I don't think it would be good to make major changes. The code itself doesn't seem to be a problem. It runs on other systems including other Altix systems, and even on this Altix system with the default stack size when using the gfortran compiler. There is something about ifort on this system with OpenMP that is not right.

Steve



0 Kudos
TimP
Honored Contributor III
1,190 Views
Quoting - Steve Cousins

The default on the Suse Altix machine is 8192 but I thought setting it to "unlimited" would make it a non-issue. I just gave it a try just specifying it to 10240 like the others.

Did you check whether stack limit must be raised by root? i.e. did your user stack increase to the value you set?
0 Kudos
Steve_Cousins
Beginner
1,190 Views
Quoting - tim18
Quoting - Steve Cousins

The default on the Suse Altix machine is 8192 but I thought setting it to "unlimited" would make it a non-issue. I just gave it a try just specifying it to 10240 like the others.

Did you check whether stack limit must be raised by root? i.e. did your user stack increase to the value you set?

Yes. It did raise it to 10240. I couldn't raise it above this number but it did get set to that value:

cousins@Altix-10G:~> ulimit -s
8192
cousins@Altix-10G:~> ulimit -s 10240
cousins@Altix-10G:~> ulimit -s
10240

Steve

Note: after I sent this I did more checking because I remember being able to set it to unlimited previously. It turns out that in a given instance of a bash shell you can set it once and then after that you can only reduce it. Starting a new shell allows me to set it to any value. I tried again at 20480 with the same problem.
0 Kudos
Steve_Cousins
Beginner
1,190 Views
Well, I noticed Steve Lionel's post about 11.1 being out so I gave it a try. It works now. That is fast work! ;^)

Further note. It works with -O2 but not with -O3. It crashes with NaN's when using -O3. I'll take it.
0 Kudos
Reply