Solved: Re: Question - Page 2

rafadix08 · ‎12-09-2009

I compiled my code on my intel fortran compiler 11.1.048.
The code just ran fine.

When I compiled and ran the code in a UNIX cluster, with the Intel Fortran 11.1 I have something really weird going on.

First, the program was crashing at some point... Debugging it I found this out:

XR(14) = log(rsk(1))**2
print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
XR(15) = log(rsk(2))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
XR(16) = log(rsk(3))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
XR(17) = log(rsk(4))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

this prints a bunch of zeros on the screen, which should be the case.

However if I do this:

XR(14) = log(rsk(1))**2
XR(15) = log(rsk(2))**2
XR(16) = log(rsk(3))**2
XR(17) = log(rsk(4))**2

print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

I get nonzero stuff printed.

More confusingly, if I code:

XR(14) = log(rsk(1))**2
print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
XR(15) = log(rsk(2))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
XR(16) = log(rsk(3))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
XR(17) = log(rsk(4))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

Then I get zeros everywhere.

How is this happening?

Let me just remind that none of this happens on my machine using the Intel compiler and Intel Fortran 11.1.048. But that happens when I migrate to the UNIX cluster with Intel Fortran 11.1

Thanks,
Rafael

Martyn_C_Intel · ‎12-17-2009

Rafael,
The problem was related to a specific optimization involving both the square of a math function and the rerolling of statements involving consecutive array elements to reconstitute a loop. The loop index was getting incremented twice, which led to the pattern noted above where the second element, XR(15), contained the result that should have been in the third element, XR(16). This will be fixed in a future compiler update.
There are, therefore, additional ways in which you could work around this, without reducing the optimization level. You could rewrite the four assignment statements as a loop; then, the compiler would not need to recreate a loop. Or, as you already noted, you could separate the calculation of the logarithms from the calculation of the squares. The former is probably the most elegant: using array notation,
XR(14:17) = log(rsk(1:4))**2
but you'd need to check that you don't have a similar construct anywhere else in your code.

In reply to your last question, an optimizing compiler is a very large and complex piece of software. Bugs are rare, but they do happen. The Intel compiler is run through a very extensive test suite, so any problems are usually only for a very specific set of circumstances, for example, involving the interplay between different optimizations, as here. When a problem is found, a corresponding test is added to the test suite, to ensure that similar problems don't recur in the future.
So whilst you shouldn't expect problems with the rest of your code, provided you check for recurrences of the exact same construct, it is good practice to compare results compiled with optimization against results when compiling without optimization, just as you would check results for a problem with a known solution when testing your own code. It becomes even more important to test and compare to a validated set of results once you begin writing parallel code.

Martyn

View solution in original post

rafadix08 · ‎12-13-2009

Ok, I was able to make it work now, but I would need some help to figure out what is going on.

Here is my call tofunction Emax_hat (the one where the problem was):

Emax(s) = Emax_hat(PI_COEF, rsk(:,kk), ExperTomorrow, s)

Here is how I declared the variables in Emax_hat:

============================================================

function Emax_hat(PI, rsk, exper, lag)

USE Global_Data

implicit none

integer , intent(in) :: exper(NSECTORS), lag
real(KIND=DOUBLE), intent(in) :: PI(NREG), rsk(NSECTORS)

integer i, s

real(KIND=DOUBLE) XR(NREG), log_Emax, Emax_hat

integer LagDummy(NSECTORS)

============================================================

NSECTORS andNREG (the sizes of many of the arrays above) are all global constants declared in module Global_Data.

I checked and in call:
Emax(s) = Emax_hat(PI_COEF, rsk(:,kk), ExperTomorrow, s)

I have the following declarations:
real(KIND=DOUBLE) PI_COEF(NREG)
real(KIND=DSOUBLE) rsk(NSECTORS,INTP)
integer ExperTomorrow(NSECTORS), lag

That is, all the types and sizes match. However, I was having the problem that I described extensively here.

I decided to try to declare thearguments of Emax_hat as assumed shape as follows:

============================================================

function Emax_hat(PI, rsk, exper, lag)

USE Global_Data

implicit none

integer , intent(in) :: exper(:), lag
real(KIND=DOUBLE), intent(in) :: PI(:), rsk(:)

integer i, s

real(KIND=DOUBLE) XR(NREG), log_Emax, Emax_hat

integer LagDummy(NSECTORS)

============================================================

And that worked.

Why is that?

So apparently the problem was indeed related to size declaration of arrays.

jimdempseyatthecove · ‎12-14-2009

My guess is when you configure to compilewith errors, that one or moreof the callers assumes (or is told) the call interface passes descriptors as opposed to first cell in the array.

Try the options for geninterfaces and warn interfaces. This may point to the errant caller(s).

Jim Dempsey

rafadix08 · ‎12-14-2009

Quoting - jimdempseyatthecove

My guess is when you configure to compilewith errors, that one or moreof the callers assumes (or is told) the call interface passes descriptors as opposed to first cell in the array.

Try the options for geninterfaces and warn interfaces. This may point to the errant caller(s).

Jim Dempsey

I have tried gen-interfaces and warn interfaces options. No error was pointed.

Steve, did you have the chance toexecute my code? Am I doing something wrong? If yes, why didn't the compiler - with all these options - detect the problem? Why have my code run smoothly on my Windows machine and not on the Linux system?

I feel very insecure going on without the answers to these questions.

Steven_L_Intel1 · ‎12-14-2009

I ran your code but did not have a chance to investigate in detail. I won't be able to get back to it for a few days.

rafadix08 · ‎12-14-2009

Quoting - Steve Lionel (Intel)

I ran your code but did not have a chance to investigate in detail. I won't be able to get back to it for a few days.

What about my comments above?Could you please take a look at them and let me know what you think?

Does it make sense that it worked with assumed-shape arrays?

Why didn't Jim's suggestions detect any error? (the options -warn intefaces and -gen-interfaces)

Steven_L_Intel1 · ‎12-14-2009

It doesn't make a lot of sense to me just reading what's here. Using deferred-shape arrays for the arguments would require that the caller see an explicit interface specifying that. I will try to look closer but it will be later this week.

rafadix08 · ‎12-14-2009

The function that is called (Emax_hat) is in the same module as the caller. So, I guess Iwould not need and explicit interface. Am I wrong?

I have a general question that might solve all of this, without the need of going through my code.

Suppose I have a modulecontaining a subroutine and a function.

Here is what I am doing (in general terms):

module my_module

contains

subroutine my_subroutine(vec)

use global_var

real(8) vec(N)

call my_function

end subroutine

function my_function(vec)

use global_var

real(8) my_function
real(8), intent(in) ::vec(N)

... function commands ...

end function

end my_module

Note here that I am using another module, called global_var where constant N is defined.

module global_var

save

integer, parameter :: N = 25

end module global_var

My questions:
1) Is what I described here correct? If yes, then I would need you to look at the code, because I am having assignment errors, as I described. If it's not correct, why didn't the compiler detect any error?

2) What I have done is instead of having "real(8), intent(in) :: vec(N)" in my_function(vec) I have "real(8), intent(in) :: vec(:)", that is vec is declared as an assumed-shape array. That worked.

3) What I have also done is to pass the dimension of vec as an argument of my_function:

function my_function(vec,dim)

real(8) my_function
real(8), intent(in) :: vec(dim)

... function commands ...

end function

That has also worked.

Steven_L_Intel1 · ‎12-14-2009

If they are in the same module, then yes, that creates an explicit interface. My guess is that N is not the actual dimension of the array when passed in.

rafadix08 · ‎12-14-2009

Quoting - Steve Lionel (Intel)

If they are in the same module, then yes, that creates an explicit interface. My guess is that N is not the actual dimension of the array when passed in.

I printed the dimension of the actual argument before calling the function and the dimension of the dummy argument, inside the function, and they match.

A new weird thing I discovered: if I compile with -debug full it works.

So it would be really helpful if you could go over the code and let me know what is going on. I understand your time constraints.

Please run the code using Linux.

Thank you,
Rafael

IanH · ‎12-14-2009

Quoting - rafadix08

Please run the code using Linux.

Note that you are posting in the forum for the windows flavour of the compiler.

Bit of speculation here - In the OpenMP 3.0 spec have a read of the Fortran specific bits in section 2.9.3.2 (data environment - shared clause), particularly the bit that starts "Under certain conditions...". There's also an elaborating example in appendix A29. Perhaps this is applicable to your code.

If so, you may have a race condition associated with the temporary copy of a variable that need to be made to mach the array section of a actual argument with an assumed size dummy. The fact that you get "array temporary" warnings is a pointer to this. Making the dummy assumed shape would avoid the need for the copy and hence avoid the race condition - which appears to be what you have found.

I'm not clear about how this potential race condition would result in the specific problem that you see, but obviously it would only apply if you had parallel execution. You claimed earlier that when you ran the code serially you still saw the problem. Are you really sure about that?

I had problems with an older version of the compiler (11.0?) when array dimensions were specified by parameters from a module and OpenMP was in use, but the symptoms were different and, if I recall correctly, it only applied to debug builds.

rafadix08 · ‎12-14-2009

Quoting - IanH

Note that you are posting in the forum for the windows flavour of the compiler.

Bit of speculation here - In the OpenMP 3.0 spec have a read of the Fortran specific bits in section 2.9.3.2 (data environment - shared clause), particularly the bit that starts "Under certain conditions...". There's also an elaborating example in appendix A29. Perhaps this is applicable to your code.

If so, you may have a race condition associated with the temporary copy of a variable that need to be made to mach the array section of a actual argument with an assumed size dummy. The fact that you get "array temporary" warnings is a pointer to this. Making the dummy assumed shape would avoid the need for the copy and hence avoid the race condition - which appears to be what you have found.

I'm not clear about how this potential race condition would result in the specific problem that you see, but obviously it would only apply if you had parallel execution. You claimed earlier that when you ran the code serially you still saw the problem. Are you really sure about that?

I had problems with an older version of the compiler (11.0?) when array dimensions were specified by parameters from a module and OpenMP was in use, but the symptoms were different and, if I recall correctly, it only applied to debug builds.

Hi Ian,
Thanks for your post.

I removed all the parallelization from the code in order to isolate the problem.So, the problem I described is not due to parallelization issues.

I read somewhere on this forum that"array temporary" warnings could be caused by non-contiguous arrays sections. Once I pass a contiguous array to the function I am having the problem, the warning disappears, but the problem persists.

Yes, I noticed I am in the Windows section... Since I am hopeless now, I am considering posting at the Linux section too.

Steven_L_Intel1 · ‎12-14-2009

I could move the whole thread to the Linux section, but at this point I don't see that it's worthwhile. I doubt that it's actually "Linux" that makes a difference, but something different about the environment in which the program is run.

Andrew_Smith · ‎12-15-2009

In your code version that creates an array temporary, if you declare a local variable to point at the array section and then pass that instead does that fix the issue? If so then maybe the temporary array creation or rollback is going wrong.

real(kind=DOUBLE), pointer :: arry(:)
...
...
arry => rsk(:,kk)
Emax(s) = Emax_hat(PI_COEF, arry, ExperTomorrow, s)

rafadix08 · ‎12-15-2009

Quoting - Andrew Smith

In your code version that creates an array temporary, if you declare a local variable to point at the array section and then pass that instead does that fix the issue? If so then maybe the temporary array creation or rollback is going wrong.

real(kind=DOUBLE), pointer :: arry(:)
...
...
arry => rsk(:,kk)
Emax(s) = Emax_hat(PI_COEF, arry, ExperTomorrow, s)

Thank you for your post, Andrew.
The code I posted already solved the "array temporary thing"... The first argument was (in another version) a non contiguous array, but in this version of the code PI_COEF is contiguous. With this fix, the warning disappeared, but the original problem persisted.

rafadix08 · ‎12-15-2009

Quoting - rafadix08

Thank you for your post, Andrew.
The code I posted already solved the "array temporary thing"... The first argument was (in another version) a non contiguous array, but in this version of the code PI_COEF is contiguous. With this fix, the warning disappeared, but the original problem persisted.

Hi again,

I was able to make the program work using the debug option. What other options does the debug option activate that might be explaining this difference in behavior?

Here is my compile file (without debug):

ifort Global_Data.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread -heap-arrays -warn interfaces -gen-interfaces

Here is my compile file with debug (only difference is the -debug option in the end):
ifort Global_Data.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread -heap-arrays -warn interfaces -gen-interfaces -debug

Just FYI, I have extensively checked consistency of all arguments (types and dimensions)...

Lorri_M_Intel · ‎12-15-2009

Well, setting "debug" turns off optimizations.

You could try your original compile line with -O0

You could try the -debug compile line with -O2

That might give interesting results.

rafadix08 · ‎12-15-2009

Quoting - Lorri Menard (Intel)

Well, setting "debug" turns off optimizations.

You could try your original compile line with -O0

You could try the -debug compile line with -O2

That might give interesting results.

The original compile with -O0 makes the code work.
With -debug together with -O2 it doesn't.

Still looking for what could be causing the problem.

rafadix08 · ‎12-15-2009

Quoting - rafadix08

The original compile with -O0 makes the code work.
With -debug together with -O2 it doesn't.

Still looking for what could be causing the problem.

New piece of information:

Compiling like that:
ifort Global_Data.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread -heap-arrays -warn interfaces -gen-interface

Produces the error I have been talking about.

However, adding -check all or just -check bounds "removes the error"
Not sure how -check interacts with optimization (if there is any interaction... didn't see it in the manual) but I added -check all -O2 just for peace of mind. The code still works.

So summarizing: the code works with -O0 (that's what I said in the previous post) and does not with -O2. However, the code WORKS with -O2 and -check bounds and/or -check all.

Still working on it... Still lost!

Ah! Almost forgot...
If I compile with "-diag enable sc" I have the following errors:
LinReg_MOD.f90(58): error #12171: dereference of NULL pointer "Y_HAT" set at (file:LinReg_MOD.f90 line:5)
LinReg_MOD.f90(68): error #12171: dereference of NULL pointer "DISP" set at (file:LinReg_MOD.f90 line:5)

However, I am not sure how to interpret that since both Y_hat and disp are local arrays with dimension given by one of the arguments of the function they belong to. The lines of the errors correspond to the initialization of Y_hat and disp.

I THINK I read somewhere that -diag-enable sc produces garbage sometimes... Just wondering whether I should pay attention to this or not...

Steven_L_Intel1 · ‎12-16-2009

I would ignore those messages.

rafadix08 · ‎12-16-2009

Hi Steve,

I just gave up looking for the bug in my program. I spent9 days now on it and was unable to find the problem. I am attaching the code I earlier sent you but with some print outs that will tell you what to look at, followed by a pause. I would greatly appreciate if you could look into it to see what is going on. Again, I understand your time constraints, so please do it at your convenience.

I am sorry I am a bit anxious, butthis is crucial for my PhD thesis work. I can't go on without it.

Basically, for each line, XR must be equal to log(rsk)**2. I am not having that for some unknown reasons.

To help you in the process here is a summary of stuff I learnt debugging it:
1) The code works fine if I compile using IVF version 11.1.048 together with Microsoft Visual Studio 2008 on my windows machine. I tried to change many of the options to make the code fail (optimization, check bounds, etc...), but the code ALWAYS worked on my Windows machine.

2) Although one of the files is called Parallel_Emax_MOD.f90, there is nothing parallel in it. I removed all parallelizations in order to focus on the origin of the problem.

3) The code fails when I compile and execute it on a Linux machine. Here is my command line:
ifort Global_Data.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread -heap-arrays -warn interfaces -gen-interface

4) The code works on Linux if I add -O0

5) The code does not work on Linux if I add -debug -O2

6) The code works if I add ONLY -check bounds or just -check to the line in 3)

UPDATE:
7) This is quite random, but if insert a couple of "print*" around the code, especially after the assignments that go wrong, that is:
XR(14) = log(rsk(1))**2
print*
XR(15) = log(rsk(2))**2
print*
XR(16) = log(rsk(3))**2
print*
XR(17) = log(rsk(4))**2
print*

Thecode works... Could I be facing a compiler bug? This behaviorsounds very random to me...

The trouble is I need to run it on the Linux machine since I will need to perform some heavy parallelization.

Many, many thanks for all the help.

Rafael

Martyn_C_Intel · ‎12-16-2009

Hi Rafael,
I investigated your problem at Steve's suggestion, and it does indeed appear to be a compiler optimization bug on Linux only. I shall pass a small reproducer along to the compiler developers to investigate further.
In the meantime, the simplest, safest way for you to proceed would be to insert a compiler directive
!DIR$ NOOPTIMIZE immediately after the FUNCTION EMAX_HAT statement. That will prevent this function from being optimized, but other functions within the file will still get optimized.

We'll let you know if we have further news or advice.

Martyn