Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28629 Discussions

ifort 11.1/089 OpenMP problem - OS X 10.6.4 - wrong numbers

Hans_J_
Beginner
1,025 Views
Hi Folks,
I am getting incorrect numbers when using more than 1 OpenMP thread with ifort 11.1/089
and a Fortran90 code. I am running OS X 10.6.4 and Xcode 3.2.3 is installed. I mention
the latter since I saw the sticky about Xcode, but as I have 11.1/089 I assume that issue
is not mine. I am also using FFTW which I built with the same compiler and the --enable-openmp
option was used to ensure compatibility with OpenMP.
For completeness, my Mac OS details are:
Darwin boussinesq.math.umass.edu 10.4.0 Darwin Kernel Version 10.4.0: Fri Apr 23 18:28:53 PDT 2010; root:xnu-1504.7.4~1/RELEASE_I386 i386
I was able tocure the issue initially by reducing the optimization all the way down to -O1, but after some
code modifications (which I need to have) it has come back.

In order to ensure that the code is correct I ported it to a Linux machine:
Linux cooper.math.lsa.umich.edu 2.6.18-164.11.1.el5 #1 SMP Wed Jan 6 13:26:04 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
The ifort compiler version on this machine is 11.1/056 and all runs fine on this machine! Unfortunately, I do not have constant access to this machine, as my Mac Pro is my main production machine.
I'd be happy to supply further info or the code itself and compile files in a tar file.
Thanks, Hans Johnston
0 Kudos
10 Replies
mecej4
Honored Contributor III
1,025 Views
Given the large size of the source code, it would help if you can rule out bugs such as array overruns and uninitialized variables being used.
0 Kudos
Hans_J_
Beginner
1,025 Views
I used the -C option previously to do so, and I'm quite sure that it is not an issue of uninitialized variables.
Is there a compiler flag to check for the latter, i.e., uninitialized variables? This could only happen in
a linear solver subroutine I'm using but did not write.
I did find this morning that if I do not specifying ANY optimization OR use -O2 then it runs!
However, with -O1 only it DOES NOT run properly. This makes no sense to me.
0 Kudos
Hans_J_
Beginner
1,025 Views
UPDATE:
-CB and -CU do not produce any errors/warnings
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,025 Views
When program runs with -CB and -CU with optimizations do wrong numbers result?

Jim
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,025 Views
Have you verified that the OpenMP code is correct?
Meaning no race conditions or improper use of shared/private/atomic.

If the number of !$OMP PARALLEL... regions is small, try inserting !$OMP CRITICAL, !$OMP END CRITICAL around the thebody of the parallel region. If that corrects the error, then start moving the ends of the critical section(s) towards the middle. Something may show up. Note, the error need not be inside the reducing critical section since critical section inserts barrier for remaining threads and will alter the execution sequence/phasing by thread for code following the critical section.

Jim
0 Kudos
Hans_J_
Beginner
1,025 Views
Hi Jim,
Thanks for the reply.
When compiling with:
ifort -CB -CU -O2 -parallel -openmp -openmp-report2 -assume byterecl -convert big_endian Inf_Pr_2D_AB3CN.f90 MY_D02UEF_D02UEFN.o -L/usr/local/lib -lfftw3 -lfftw3_threads -L/opt/intel/Compiler/11.1/089/Frameworks/mkl/lib -I/opt/intel/Compiler/11.1/089/Frameworks/mkl/include -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -liomp5 -lpthread
Everything runs fine, with or without the -CB and -CU. BTW, the object file above is compiled as:
ifort -CB -CU -O2 -c -r8 FOMY_D02UEF_D02UEFN.f90 -o MY_D02UEF_D02UEFN.o -L/usr/local/lib -lfftw3 -lfftw3_threads -L/opt/intel/Compiler/11.1/089/Frameworks/mkl/lib -I/opt/intel/Compiler/11.1/089/Frameworks/mkl/include -lmkl_intel_lp64 -lmkl_seqential -lmkl_core -liomp5 -lpthread
Replacing -O2 with -O1 in both compiles again results in garbage, with or without the -CU and -CB. This is whatI find puzzling.
There are only 2 OpenMP loops in the code, both similar, and here is one of them:
! set BCs vars for temperature eqn solve
bcx_2(1) = 1.0D+0
bcx_2(2) = -1.0D+0
bcar_2(1:2,1:3) = 0.0D+0
bcar_2(1,1) = 1.0D+0
bcar_2(2,1) = 1.0D+0
order = 2
! solve temperature equation for each Fourier mode i
!$OMP PARALLEL DO DEFAULT(PRIVATE)
SHARED(nx,tembc,fm2,dt,nz,order,mat1,bcar_2,bcx_2,tem,temz,tem_lap,invscale)
do i = 0,nx-1
bcval_2(1) = tembc(i,1)
bcval_2(2) = tembc(i,2)
pdecoeff_2(1) = -dt*invscale/2.0D+0
pdecoeff_2(2) = 0.0D+0
pdecoeff_2(3) = 1.0D+0-fm2(i)*dt*invscale/2.0D+0
ifail = 0
res = 0.0D+0
sol_2 = 0.0D+0
rhs_2(0:nz) = mat1(i,0:nz)
call MY_D02UEF(nz,order,rhs_2, &
bcar_2,bcx_2,bcval_2, &
pdecoeff_2,sol_2,res,ifail)
tem(i,0:nz) = sol_2(0:nz,1)
temz(i,0:nz) = sol_2(0:nz,2)
tem_lap(i,0:nz) = sol_2(0:nz,3)
end do
!$OMP END PARALLEL DO
The routine MY_D02UEF is solving a discretized boundary value problem. It does call LAPACK (in MKL)
which is why I use the mkl_sequential library. In fact, this is not truly needed, for the ifort docs
explain that if one is in a parallel region only sequential libraries will be used.
As I noted in my original post, EVERYTHING RUNS FINE on a Linux machine using ifort. I can
only conclude that there is an issue with the OS X compiler.
What do you think?
Thanks again, Hans
BTW - same results with or without the -parallel option
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,025 Views
I do not see anything that jumps out at me saying "bug here"

Are any of your DEFAULT(PRIVATE) variables/arrays also SAVE?
Are any of your SHARED(...) used as if private by MU_D02UEF(...)?

if nothing shows up try the following

!$OMP CRITICAL
tem(i,0:nz) = sol_2(0:nz,1)
temz(i,0:nz) = sol_2(0:nz,2)
tem_lap(i,0:nz) = sol_2(0:nz,3)
!$OMP END CRITICAL
end do

The code, as you have written, is without error. However, the compiler, should it have a bug, and I am not saying it has a bug, may be merging the data in the output arrays in a non-thread safe manner due to a stride issue.

The critical section, if it cures the problem, will indicate a compiler issue.

The other culpret could be your other loop.

BTW

You are (are you) aware that the calller to MY_D02UEF knows sol_2 is a 0-based array. Lack of proper declaration of this dummy arg in MY_D02UEF may require using 1-based indexing of this arg. (sameissue with rhs_2)

Jim Dempsey
0 Kudos
Hans_J_
Beginner
1,025 Views
Hi Jim,
The OMP CRITICAL statements completely solved the problem! Many thanks.
So in your opinion, does this indicate a compiler bug (I would think it does) and
how would I go about letting the Intel folks.
Thanks, Hans
0 Kudos
jimdempseyatthecove
Honored Contributor III
1,025 Views
You file this on primere.intel.com. In my opinion this is a bug. The stores to separate cells in the array by separate threads should not interfere with each other. My guess is the sse code generated is not multi-thread safe (performing a read, merge, write, or performing streaming stores when it should not be).

Steve, have you read Hans's post and my work around.

It looks like there is a bug in the sse code regarding OpenMP and storing in arrays with stride .ne. 1.
Data is stored ok as long as there is no concurrency.

Hans, can you attach the errant file (the one you inserted the !$OMP CRITICAL).

This was a lucky guess on my part. I haven't seen this error before.

"...when you have eliminated the impossible, whatever remains, however improbable, must be the truth?"

Holmes to Watson

Jim Dempsey
0 Kudos
Steven_L_Intel1
Employee
1,025 Views
Jim, this is not my area of expertise. I agree with the suggestion to file a report at Intel Premier Support.
0 Kudos
Reply