ifort 11.1/089 OpenMP problem - OS X 10.6.4 - wrong numbers

Hans_J_ · ‎09-28-2010

Hi Folks,

I am getting incorrect numbers when using more than 1 OpenMP thread with ifort 11.1/089

and a Fortran90 code. I am running OS X 10.6.4 and Xcode 3.2.3 is installed. I mention

the latter since I saw the sticky about Xcode, but as I have 11.1/089 I assume that issue

is not mine. I am also using FFTW which I built with the same compiler and the --enable-openmp

option was used to ensure compatibility with OpenMP.

For completeness, my Mac OS details are:

Darwin boussinesq.math.umass.edu 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386

I was able tocure the issue initially by reducing the optimization all the way down to -O1, but after some

code modifications (which I need to have) it has come back.

In order to ensure that the code is correct I ported it to a Linux machine:

Linux cooper.math.lsa.umich.edu 2.6.18-164.11.1.el5 #1 SMP Wed Jan 6 13:26:04 EST 2010 x86_64 x86_64 x86_64 GNU/Linux

The ifort compiler version on this machine is 11.1/056 and all runs fine on this machine! Unfortunately, I do not have constant access to this machine, as my Mac Pro is my main production machine.

I'd be happy to supply further info or the code itself and compile files in a tar file.

Thanks, Hans Johnston

mecej4 · ‎09-29-2010

Given the large size of the source code, it would help if you can rule out bugs such as array overruns and uninitialized variables being used.

Hans_J_ · ‎09-29-2010

I used the -C option previously to do so, and I'm quite sure that it is not an issue of uninitialized variables.

Is there a compiler flag to check for the latter, i.e., uninitialized variables? This could only happen in

a linear solver subroutine I'm using but did not write.

I did find this morning that if I do not specifying ANY optimization OR use -O2 then it runs!

However, with -O1 only it DOES NOT run properly. This makes no sense to me.

Hans_J_ · ‎09-29-2010

UPDATE:

-CB and -CU do not produce any errors/warnings

jimdempseyatthecove · ‎09-29-2010

When program runs with -CB and -CU with optimizations do wrong numbers result?

Jim

jimdempseyatthecove · ‎09-29-2010

Have you verified that the OpenMP code is correct?
Meaning no race conditions or improper use of shared/private/atomic.

If the number of !$OMP PARALLEL... regions is small, try inserting !$OMP CRITICAL, !$OMP END CRITICAL around the thebody of the parallel region. If that corrects the error, then start moving the ends of the critical section(s) towards the middle. Something may show up. Note, the error need not be inside the reducing critical section since critical section inserts barrier for remaining threads and will alter the execution sequence/phasing by thread for code following the critical section.

Jim

Hans_J_ · ‎09-30-2010

Hi Jim,

Thanks for the reply.

When compiling with:

ifort -CB -CU -O2 -parallel -openmp -openmp-report2 -assume byterecl -convert big_endian Inf_Pr_2D_AB3CN.f90 MY_D02UEF_D02UEFN.o -L/usr/local/lib -lfftw3 -lfftw3_threads -L/opt/intel/Compiler/11.1/089/Frameworks/mkl/lib -I/opt/intel/Compiler/11.1/089/Frameworks/mkl/include -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -liomp5 -lpthread

Everything runs fine, with or without the -CB and -CU. BTW, the object file above is compiled as:

ifort -CB -CU -O2 -c -r8 FOMY_D02UEF_D02UEFN.f90 -o MY_D02UEF_D02UEFN.o -L/usr/local/lib -lfftw3 -lfftw3_threads -L/opt/intel/Compiler/11.1/089/Frameworks/mkl/lib -I/opt/intel/Compiler/11.1/089/Frameworks/mkl/include -lmkl_intel_lp64 -lmkl_seqential -lmkl_core -liomp5 -lpthread

Replacing -O2 with -O1 in both compiles again results in garbage, with or without the -CU and -CB. This is whatI find puzzling.

There are only 2 OpenMP loops in the code, both similar, and here is one of them:

! set BCs vars for temperature eqn solve

bcx_2(1) = 1.0D+0

bcx_2(2) = -1.0D+0

bcar_2(1:2,1:3) = 0.0D+0

bcar_2(1,1) = 1.0D+0

bcar_2(2,1) = 1.0D+0

order = 2

! solve temperature equation for each Fourier mode i

!$OMP PARALLEL DO DEFAULT(PRIVATE)

SHARED(nx,tembc,fm2,dt,nz,order,mat1,bcar_2,bcx_2,tem,temz,tem_lap,invscale)

do i = 0,nx-1

bcval_2(1) = tembc(i,1)

bcval_2(2) = tembc(i,2)

pdecoeff_2(1) = -dt*invscale/2.0D+0

pdecoeff_2(2) = 0.0D+0

pdecoeff_2(3) = 1.0D+0-fm2(i)*dt*invscale/2.0D+0

ifail = 0

res = 0.0D+0

sol_2 = 0.0D+0

rhs_2(0:nz) = mat1(i,0:nz)

call MY_D02UEF(nz,order,rhs_2, &

bcar_2,bcx_2,bcval_2, &

pdecoeff_2,sol_2,res,ifail)

tem(i,0:nz) = sol_2(0:nz,1)

temz(i,0:nz) = sol_2(0:nz,2)

tem_lap(i,0:nz) = sol_2(0:nz,3)

end do

!$OMP END PARALLEL DO

The routine MY_D02UEF is solving a discretized boundary value problem. It does call LAPACK (in MKL)

which is why I use the mkl_sequential library. In fact, this is not truly needed, for the ifort docs

explain that if one is in a parallel region only sequential libraries will be used.

As I noted in my original post, EVERYTHING RUNS FINE on a Linux machine using ifort. I can

only conclude that there is an issue with the OS X compiler.

What do you think?

Thanks again, Hans

BTW - same results with or without the -parallel option

jimdempseyatthecove · ‎09-30-2010

I do not see anything that jumps out at me saying "bug here"

Are any of your DEFAULT(PRIVATE) variables/arrays also SAVE?
Are any of your SHARED(...) used as if private by MU_D02UEF(...)?

if nothing shows up try the following

!$OMP CRITICAL

tem(i,0:nz) = sol_2(0:nz,1)

temz(i,0:nz) = sol_2(0:nz,2)

tem_lap(i,0:nz) = sol_2(0:nz,3)

!$OMP END CRITICAL
end do

The code, as you have written, is without error. However, the compiler, should it have a bug, and I am not saying it has a bug, may be merging the data in the output arrays in a non-thread safe manner due to a stride issue.

The critical section, if it cures the problem, will indicate a compiler issue.

The other culpret could be your other loop.

BTW

You are (are you) aware that the calller to MY_D02UEF knows sol_2 is a 0-based array. Lack of proper declaration of this dummy arg in MY_D02UEF may require using 1-based indexing of this arg. (sameissue with rhs_2)

Jim Dempsey

Hans_J_ · ‎09-30-2010

Hi Jim,

The OMP CRITICAL statements completely solved the problem! Many thanks.

So in your opinion, does this indicate a compiler bug (I would think it does) and

how would I go about letting the Intel folks.

Thanks, Hans

jimdempseyatthecove · ‎09-30-2010

You file this on primere.intel.com. In my opinion this is a bug. The stores to separate cells in the array by separate threads should not interfere with each other. My guess is the sse code generated is not multi-thread safe (performing a read, merge, write, or performing streaming stores when it should not be).

Steve, have you read Hans's post and my work around.

It looks like there is a bug in the sse code regarding OpenMP and storing in arrays with stride .ne. 1.
Data is stored ok as long as there is no concurrency.

Hans, can you attach the errant file (the one you inserted the !$OMP CRITICAL).

This was a lucky guess on my part. I haven't seen this error before.

"...when you have eliminated the impossible, whatever remains, however improbable, must be the truth?"

Holmes to Watson

Jim Dempsey

Steven_L_Intel1 · ‎10-01-2010

Jim, this is not my area of expertise. I agree with the suggestion to file a report at Intel Premier Support.