-openmp -O1 bug with 13.1.3.192

Alexis_R_ · ‎08-23-2013

The following code:

[fortran]module parallel_mod

type interpol_spline
real(kind=8), allocatable   :: x(:), y(:), dy(:), ddy(:)
real(kind=8), allocatable   :: w(:,:)
end type interpol_spline

type filament
type(interpol_spline)       :: x_spline
type(interpol_spline)       :: y_spline
type(interpol_spline)       :: z_spline
type(interpol_spline)       :: s_spline
end type

contains

subroutine parallel_test_two()
implicit none
type(filament) :: fil_tmp
!$omp parallel private(fil_tmp)
!$omp end parallel
end subroutine

end module parallel_mod

program omp_test
use parallel_mod
implicit none
print *, 'about to test omp'
call parallel_test_two()
print *, 'all done'
end program omp_test[/fortran]

When compiled like this:

[bash]ifort -O1 -openmp omp_test.f90[/bash]

Outputs this at runtime:

[plain] about to test omp
Segmentation fault[/plain]

My ifort version information:

[plain]Intel(R) Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 13.1.3.192 Build 20130607[/plain]

I assume this is an optimizer/OpenMP bug. But I can't find a workaround - any help/suggestions would be great.

Casey · ‎08-23-2013

I am unable to reproduce this reesult using the same compiler version and the same compile flags on linux.

Steven_L_Intel1 · ‎08-23-2013

I can reproduce it but am not familiar enough with OpenMP to know whether or not this should work. My guess is that it should. I think the issue has to do with the unallocated allocatable components and the private clause.

Alexis_R_ · ‎08-23-2013

Hi Steve,

If you manage to come up with a workaround that would be great.

With private, I always do all the allocation once in the parallel section. The behaviour is not well defined otherwise. I'm pretty sure my reproducer should work. In fact, just changing slightly the make up of the derived types makes it work so it really smells like a bug.

Steve Lionel (Intel) wrote:

I can reproduce it but am not familiar enough with OpenMP to know whether or not this should work. My guess is that it should. I think the issue has to do with the unallocated allocatable components and the private clause.

TimP · ‎08-23-2013

In case it's of interest to anyone, it may be useful to know more about OP's environment. When I attempted it according to the OP's instructions on 64-bit linux, it died with no segfault, but adding -g -traceback to the build options could produce a segfault.

We have been seeing a great deal of difficulty with allocatable and automatic arrays under -openmp (-O1 and up) with 12.0 through 14.0 compilers, even without involving derived type. Some of the troublesome cases actually worked with (only) this version of the compiler.

Alexis_R_ · ‎08-23-2013

Tim,

Let me know what you'd like to know about the environment. Here are a few things to get started:

[plain]$ cat /etc/redhat-release
Fedora release 18 (Spherical Cow)
$ head /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2687W 0 @ 3.10GHz
stepping : 7
microcode : 0x70d
cpu MHz : 3100.000
cache size : 20480 KB
physical id : 0[/plain]

On this system, there is still a segfault with flags -g -traceback -O1 -openmp

Do let me know what other information I may be able to provide to help with this. It would be great to have rock-solid OpenMP support.

jimdempseyatthecove · ‎08-24-2013

As a work around, try:

[fortran]

subroutine parallel_test_two()
implicit none
type(filament), pointer :: fil_tmp => NULL()
!$omp parallel private(fil_tmp)
allocate(fil_temp)
! allocate(fil_temp%x_spline%x(nnn))
! ...
! deallocate(fil_temp%x_spline%x(nnn))
deallocate(fil_temp)
nullify(fil_temp)
!$omp end parallel
end subroutine
[/fortran]

Jim Dempsey

Alexis_R_ · ‎08-24-2013

Jim,

Thanks for this very nice tip. Works fine at compile-time and appears to be good at runtime too.

IanH · ‎08-24-2013

FWIW, the behaviour of the original example isn't defined by the (current) OpenMP 4.0 spec because of the use of allocatable components. As a result, the OP's program has one foot in choose-your-own-adventure land See the list on page 22 of the OpenMP spec.

jimdempseyatthecove · ‎08-25-2013

A. Rohou,

One more helpful hint. Use firstprivate in place of private. This will copy the NULL value of the pointer into the parallel region. While the proffered above code works, other code may not if it depends on/uses ASSOCIATED to test the validity of the pointer.

There is another old thread relating to this subject.

Jim Dempsey

Alexis_R_ · ‎08-26-2013

Thanks all. Since IanH points out that what I was expecting is not actually defined by OpenMP 4.0, I think I will do something like this instead:

[fortran]subroutine parallel_test_two()
use omp_lib
implicit none
type(filament), allocatable :: fil_tmp(:)
!$omp parallel shared(fil_tmp)
!$omp single
allocate(fil_tmp(omp_get_num_threads()))
!$omp end single
!(...)
!$omp barrier
!$omp single
deallocate(fil_tmp)
!$omp end single
!$omp end parallel
end subroutinee[/fortran]

I hope that this will be more likely to work, since I'm not expecting the complier to handle any implicit memory allocation of derived types with allocatable components anymore.

[Edit: added barrier before single]

jimdempseyatthecove · ‎08-26-2013

You might want to see if using:

... fil_tmp(myThreadNumber)%... ...

adds excessive overhead.

Jim Dempsey

Alexis_R_ · ‎08-26-2013

jimdempseyatthecove wrote:

You might want to see if using:

... fil_tmp(myThreadNumber)%... ...

adds excessive overhead.

Jim Dempsey

Jim,

I'm not sure I know exactly what you mean. In terms of memory, I would have expected this latest workaround to be ~ equivalent to using PRIVATE since there's only one additional array descriptor (fil_tmp(:)), but I don't really understand these things well enough to be sure that's true. On the other hand, perhaps you mean some kind of computing overhead?

jimdempseyatthecove · ‎08-26-2013

If you allocate a shared array of filament, your references are going to be:

iThread = omp_get_thread_num()
...
fil_tmp(iThread)%x_spline%x(i) = fil_tmp(iThread)%x_spline%x(i) + dX

When using the pointer, (or DUMMY with reference):

myFilament%x_spline%x(i) = myFilament%x_spline%x(i) + dX

You remove one array index operation. The compiler may remove this automatically assuming availability of registers (low complexity of code).

If you are "pointer adverse", then consider encapsulating the body of the code and calling with reference to array element

call doWork(fil_tmp(iThread), other, args, here)
...
subroutine doWork(myFilament, other, args, here)
type(filament) :: myFilament
...

If you want, you can use a contains subroutine.

I am not "pointer adversed". The efforts to hide the pointer is more work.

Jim Dempsey

Alexis_R_ · ‎08-26-2013

Thanks Jim, I know what you mean now. It turns out my code is already "encapsulated" the way you described with

call doWork(fil_tmp(iThread), other, args, here)

. So in my case, there's no or not much coding overhead.