ifort v8 openmp errors with RH AS3

hughmerz · ‎07-09-2004

The program I am working on compiles fine using the same system (quad Itanium 1 733 Mhz with 64 GB of RAM) with RH 7.1, glibc-2.2.4-19.3, 2.4.20 kernel and version 7.1 of the intel fortran compiler.

After upgrading to Red Hat Enterprise Linux AS release 3 (Taroon Update 2 (glibc-2.3.2-95.20, kernel 2.4.21-15.EL)) and upgrading the compiler to version 8.0 (l_fc_pc_8.0.046_pl049.1) I have experienced problems pertaining to Openmp that did not occur beforehand.

If the code is compiled using -O1, -O2 or -O3 then it crashes with a segfault. I recieve the following program stack within idb:

Thread received signal SEGV
stopped at [subroutine pmfast`update_position():296 0x4000000000032f00]
296 !$omp parallel do default(shared) private(ip)
(idb) where

0 0x4000000000035240 in update_position() "pmfast.f90":296
#1 0x2000000000381ee0 in /opt/intel_fc_80/lib/libguide.so
#2 0x2000000000359ee0 in __kmpc_invoke_task_func(...) in /opt/intel_fc_80/lib/libguide.so
#3 0x2000000000358280 in __kmp_launch_threads(...) in /opt/intel_fc_80/lib/libguide.so
#4 0x200000000037c8b0 in __kmp_set_stack_info(...) in /opt/intel_fc_80/lib/libguide.so
#5 0x20000000004505b0 in start_thread(...) in /lib/tls/libpthread.so.0
#6 0x40000000001b6a80 in __libc_csu_init(...) in pmfast
#7 0x40000000001b6a80 in __libc_csu_init(...) in pmfast
#8 ... (continues with the above endlessly)

The region of the code in question is a straight-forward parallel do loop:

296 !$omp parallel do default(shared) private(ip)
297 do ip=1,nploc
298 xvp(1:2,ip)=modulo(xvp(1:2,ip)+xvp(4:5,ip)*dttot,RLPS)
299 xvp(3,ip)=xvp(3,ip)+xvp(6,ip)*dttot
300 enddo
301 !$omp end parallel do

If I compile the program using -O0 (no optimizations) then it manages to complete this loop, but hangs further on in the code while processing an openmp parallel region.

I'm going to try reducing the code to make the problem easier to diagnose, but any suggestions or possible routes for investigation would be appreciated. Note that my environment is set properly with regards to stack sizes and whatnot.

martin_ · ‎07-16-2004

I may have a related problem. I have discovered that:

"Some aspects of OpenMP programming are known to not yet work, such as making thread-private deferred-shape array dummy arguments. If you encounter other problems with OpenMP support, please report them through Premier Support." - ifort v8 Beta release notes.

This means if you use an allocatable array then the code crashes when it tries to reference an element:

forrtl: severe (408): fort: (2): Subscript #2 of the array DMASSR has value 1 which is greater than the upper bound of -1

Does anyone know a way around this - or do I have to rewrite my code to have less efficient memory usage... or is this problem fixed now if I download the latest version of the compiler?

Thanks,

Martin.

Steven_L_Intel1 · ‎07-16-2004

You're quoting from the beta test release notes for 8.0. All those issues were resolved for the final release and that text does not appear in the final release notes. The current version, 8.0.049, has had many improvements made to it since the initial release.

martin_ · ‎07-16-2004

Hi Steve,

I have build:

Intel Fortran Compiler for 32-bit applications, Version 8.0 Build 20031016Z Package ID: l_fc_p_8.0.034
Copyright (C) 1985-2003 Intel Corporation. All rights reserved.
FOR NON-COMMERCIAL USE ONLY

and checking the release notes for this build this problem isn't there (I should probably have done this before). However, my code is producing a segfault in which it is having trouble with array indices for allocated arrays inside the openMP loop. I will try to produce a smaller loop of code which reproduces the error...

Thanks,

Martin.

Steven_L_Intel1 · ‎07-16-2004

Again, you are using a compiler that was released seven months ago. We've made lots of improvements since then. Please download a current compiler from your Premier Support account.

hughmerz · ‎07-16-2004

Just to illustrate the problem I've stated above, I've managed to reduce my code to the following straightforward example:

program pmerror
implicit none
real(4), parameter :: RLPS = 80
integer(4) nploc
real(4) :: dt,dt_old
real(4), dimension(6,400000) :: xvp
common /rarr/ xvp
call omp_set_num_threads(4)
nploc=40**3
dt=0.5
dt_old=0.0
xvp=0.4
call update_position
contains
subroutine update_position
implicit none
integer(kind=4) :: ip
real(kind=4) :: dttot
dttot=0.5*(dt+dt_old)
!$omp parallel do default(shared) private(ip)
do ip=1,nploc
xvp(1:2,ip)=modulo(xvp(1:2,ip)+xvp(4:5,ip)*dttot,RLPS)
xvp(3,ip)=xvp(3,ip)+xvp(6,ip)*dttot
enddo
!$omp end parallel do
end subroutine update_position
end program

There is nothing magical going on here, yet the latest versions of the 8.0 compiler and the 8.1 beta compiler both produce an executable that segfaults while trying to execute the parallel do loop. It works in version 7. I've submitted issues to premier support, hopefully this will be fixed in a future release.

Personally I found the version 7 compilers at the end of their release cycle to be more stable than the current version 8 compilers.

I have found martin_'s problem quite frequently and actually avoid (much to my dismay) dynamic memory allocation with the intel fortran compiler. I have found it to cause unexpected memory addressing errors, especially in OpenMP codes (at least on the Itanium 1 architecture). For us the work around is to place all of our large memory structures in common blocks. Not the most elegant way to code, but it gets the job done.