Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Problem compiling using -parallel

Mike_Rezny
Novice
1,877 Views
Hi,
The following test program (cut down from a large global climate model) fails with a SIGSEGV error and no traceback information when built with -parallel -O3 (or -O2) but runs correctly with -parallel -O1 (or -O0) on Linux using the Intel Fortran Compiler 11.1.038. I cannot get idb to step into the source file to get any further debug information.

Any help to get idb to step into assel.f90 would be appreciated. -O0 is not useful since this also results in the program running correctly.

Any help to identify whether this is a compiler problem or not would be greatly appreciated.

There is a dependence in the first loop (after commenting out the write*) and the compiler is parallelising thek loop.
A CDEC$ noparallel directive before this loop also fixes the problem. Also, OpenMP directives around this loop also results in the program running correctly (adding -openmp, of course, and private(k,mg).

test1.f90:

**********************************************
program test1
implicit none

integer, parameter :: lat = 49, lat2 = 2*lat
integer, parameter :: lon = 192, ln2 = 2*lon

real pgd(lon,lat2,2)
common /uvpgd/ pgd

call random_number(pgd)
call assel()

end program test1


assel.f90:
*******************************************************

subroutine assel
implicit none

integer, parameter :: lat = 48, lat2 = 2*lat
integer, parameter :: lon = 192, ln2 = 2*lon
integer, parameter :: nl = 18

integer :: k, lgns, ns, lg, mg

real pgd(lon,lat2,2)
common /uvpgd/ pgd

real :: muf_mufm(ln2, nl, lat)
real :: asf

asf = 0.5
print*, "MMRR 100 lat2, lat ", lat2, lat
print*, "MMRR 101 nl, lon, ln2 ", nl, lon, ln2

do lgns = 1, lat2
ns = 1 - (lgns-1) / lat
lg = ns*lgns + (lat2+1-lgns)*(1-ns)
print*, "MMRR 200", lgns, ns, lg
ns = ns * lon
do k = 1, nl
do mg=1,lon
muf_mufm(mg+ns,k,lg)=asf*pgd(mg,lgns,2 )/pgd(mg,lgns,1)
enddo
enddo
enddo ! lgns=1,lat2

print*, "MMRR 500", muf_mufm(1,1,1)
return
end subroutine assel


Makefile:
*******************************************************
FFLAGS = -debug extended -traceback
PFLAGS = $(FFLAGS) -parallel -par-report3

all: test1 test2

test1: test1.f90 Makefile assel.f90
ifort -O0 $(PFLAGS) -c test1.f90
ifort -O2 $(PFLAGS) -c assel.f90
ifort $(PFLAGS) -o test1 test1.o assel.o

test2: test1.f90 Makefile assel.f90
ifort -O3 $(FFLAGS) -o test2 test1.f90 assel.f90


test2 runs correctly and test1 fails giving the following output:
MMRR 200 49 0 48
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpthread.so.0 00002B19AA9E992B Unknown Unknown Unknown
libiomp5.so 00002B19AA8B42CC Unknown Unknown Unknown

regards
Mike

0 Kudos
1 Solution
Ron_Green
Moderator
1,877 Views
The bug report number is DPD200138636

Again, this bug does not appear in 10.1.022. So a workaround could be to compile assel.f90 with that compiler and everything else with 11.1. Here is a link for getting older versions: http://software.intel.com/en-us/articles/older-version-product/

Could you tell me what code this affects? WRF or another weather code?

ron

View solution in original post

0 Kudos
10 Replies
Mike_Rezny
Novice
1,877 Views
Quoting - mreznysgi.com
Hi,
The following test program (cut down from a large global climate model) fails with a SIGSEGV error and no traceback information when built with -parallel -O3 (or -O2) but runs correctly with -parallel -O1 (or -O0) on Linux using the Intel Fortran Compiler 11.1.038. I cannot get idb to step into the source file to get any further debug information.

Any help to get idb to step into assel.f90 would be appreciated. -O0 is not useful since this also results in the program running correctly.

Any help to identify whether this is a compiler problem or not would be greatly appreciated.

There is a dependence in the first loop (after commenting out the write*) and the compiler is parallelising thek loop.
A CDEC$ noparallel directive before this loop also fixes the problem. Also, OpenMP directives around this loop also results in the program running correctly (adding -openmp, of course, and private(k,mg).

test1.f90:

**********************************************
program test1
implicit none

integer, parameter :: lat = 49, lat2 = 2*lat
integer, parameter :: lon = 192, ln2 = 2*lon

real pgd(lon,lat2,2)
common /uvpgd/ pgd

call random_number(pgd)
call assel()

end program test1


assel.f90:
*******************************************************

subroutine assel
implicit none

integer, parameter :: lat = 48, lat2 = 2*lat
integer, parameter :: lon = 192, ln2 = 2*lon
integer, parameter :: nl = 18

integer :: k, lgns, ns, lg, mg

real pgd(lon,lat2,2)
common /uvpgd/ pgd

real :: muf_mufm(ln2, nl, lat)
real :: asf

asf = 0.5
print*, "MMRR 100 lat2, lat ", lat2, lat
print*, "MMRR 101 nl, lon, ln2 ", nl, lon, ln2

do lgns = 1, lat2
ns = 1 - (lgns-1) / lat
lg = ns*lgns + (lat2+1-lgns)*(1-ns)
print*, "MMRR 200", lgns, ns, lg
ns = ns * lon
do k = 1, nl
do mg=1,lon
muf_mufm(mg+ns,k,lg)=asf*pgd(mg,lgns,2 )/pgd(mg,lgns,1)
enddo
enddo
enddo ! lgns=1,lat2

print*, "MMRR 500", muf_mufm(1,1,1)
return
end subroutine assel


Makefile:
*******************************************************
FFLAGS = -debug extended -traceback
PFLAGS = $(FFLAGS) -parallel -par-report3

all: test1 test2

test1: test1.f90 Makefile assel.f90
ifort -O0 $(PFLAGS) -c test1.f90
ifort -O2 $(PFLAGS) -c assel.f90
ifort $(PFLAGS) -o test1 test1.o assel.o

test2: test1.f90 Makefile assel.f90
ifort -O3 $(FFLAGS) -o test2 test1.f90 assel.f90


test2 runs correctly and test1 fails giving the following output:
MMRR 200 49 0 48
forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
libpthread.so.0 00002B19AA9E992B Unknown Unknown Unknown
libiomp5.so 00002B19AA8B42CC Unknown Unknown Unknown

regards
Mike

Hi,
I have some further information from valgrind that may be of help:

MMRR 200 49 0 48
==16785==
==16785== Invalid write of size 8
==16785== at 0x403CC8: assel_ (in /home/mrezny/tests/mk3/test1)
==16785== by 0x40ACD02: __kmp_invoke_microtask (in /opt/intel/Compiler/11.1/038/lib/intel64/libiomp5.so)
==16785== by 0x408F314: __kmpc_invoke_task_func (in /opt/intel/Compiler/11.1/038/lib/intel64/libiomp5.so)
==16785== by 0x4091A29: __kmp_fork_call (in /opt/intel/Compiler/11.1/038/lib/intel64/libiomp5.so)
==16785== by 0x407C0B8: __kmpc_fork_call (in /opt/intel/Compiler/11.1/038/lib/intel64/libiomp5.so)
==16785== by 0x4039C8: assel_ (in /home/mrezny/tests/mk3/test1)
==16785== by 0x40365E: MAIN__ (in /home/mrezny/tests/mk3/test1)
==16785== by 0x40355B: main (in /home/mrezny/tests/mk3/test1)
==16785== Address 0x9310c0 is not stack'd, malloc'd or (recently) free'd
==16785==

regards
Mike
0 Kudos
Ron_Green
Moderator
1,877 Views
Mike,

This does look like a bug in the 11.x versions. I don't see the error in 10.1.022 compiler.

I need to do a little more triage and get a bug report started. Thanks for cutting this down to a small reproducing testcase, it makes it much easier to work with.

more on this shortly.

ron
0 Kudos
Ron_Green
Moderator
1,878 Views
The bug report number is DPD200138636

Again, this bug does not appear in 10.1.022. So a workaround could be to compile assel.f90 with that compiler and everything else with 11.1. Here is a link for getting older versions: http://software.intel.com/en-us/articles/older-version-product/

Could you tell me what code this affects? WRF or another weather code?

ron
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,877 Views

Have you tried adding -openmp

Although your code when not using OpenMP specifically, components of OpenMP are being use with the auto parallization -parallel. The idea is to add the OpenMP dependencies.

Jim Dempsey
0 Kudos
TimP
Honored Contributor III
1,877 Views

Have you tried adding -openmp

Although your code when not using OpenMP specifically, components of OpenMP are being use with the auto parallization -parallel. The idea is to add the OpenMP dependencies.

Mike said -openmp did work for him.
If I read the code correctly, the result being calculated is invariant across the loop on k but is being broadcast across a non-unity stride array. -parallel may be part way implementing an optimization based on that. Sometimes, it's safer to write out an optimization explicitly rather than risk the compiler doing it part way.
0 Kudos
Mike_Rezny
Novice
1,877 Views
The bug report number is DPD200138636

Again, this bug does not appear in 10.1.022. So a workaround could be to compile assel.f90 with that compiler and everything else with 11.1. Here is a link for getting older versions: http://software.intel.com/en-us/articles/older-version-product/

Could you tell me what code this affects? WRF or another weather code?

ron

Hi Ron,
from what I can remember, 11.0.081 works and the problem started from 11.0.083.
I haven't downloaded and installed the latest release after 11.1.038 although it is now available on our benchmarking machines in the US.

The code is MK3.5 a coupled ocean - climate modeldesignedat CSIRO in Australia.

I have a workaround. I have added OpenMP directives around all 7loop nests in the original code
and compile with -openmp.

regards
Mike
0 Kudos
Mike_Rezny
Novice
1,877 Views

Have you tried adding -openmp

Although your code when not using OpenMP specifically, components of OpenMP are being use with the auto parallization -parallel. The idea is to add the OpenMP dependencies.

Jim Dempsey

Hi Jim,
I tried your suggestion but it made no difference.

My soultion has been to put explicit OpenMP directives around all the loop nests and compile with -openmp

regards
Mike
0 Kudos
Mike_Rezny
Novice
1,877 Views
Quoting - tim18
Mike said -openmp did work for him.
If I read the code correctly, the result being calculated is invariant across the loop on k but is being broadcast across a non-unity stride array. -parallel may be part way implementing an optimization based on that. Sometimes, it's safer to write out an optimization explicitly rather than risk the compiler doing it part way.

Hi Tim,
yes I suspect that there is some interaction between some HLO and the fact that the prarlellizer had to reject the first loop due to dependencies and chose to parallelize the second loop.

My mission, should I choose to accept it, will be to understand what this loop nest is really trying to achieve and restructure the code to remove the dependency.

regards
Mike
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,877 Views

Mike,

Another thing you might experiment with (assuming you have the time and inclination).
When compiling for OpenMP, subroutines and functions "inherit" RECURSIVE. Try explicitly adding RECURSIVE to the declaration of your subroutine. If that fails, then revert to the OpenMP thing.

Good luck,

Jim
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,877 Views

Sometimes, when I've had similar problems with earlier versions of IVF if I simplify the loop, the optimizer doesn't goof up.

Try copying the codeof the two inner loops (5 lines) and placing into a subroutine, then calling the subroutine. The run length of the two inner loops is sufficiently larger than the call overhead so I do not thing the overhead would be too significant.

You might find this straitensout the error, and then the optimization will inline the subroutine and eliminate the call overhead.

Jim
0 Kudos
Reply