- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The code just ran fine.
When I compiled and ran the code in a UNIX cluster, with the Intel Fortran 11.1 I have something really weird going on.
First, the program was crashing at some point... Debugging it I found this out:
XR(14) = log(rsk(1))**2
print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
XR(15) = log(rsk(2))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
XR(16) = log(rsk(3))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
XR(17) = log(rsk(4))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2
this prints a bunch of zeros on the screen, which should be the case.
However if I do this:
XR(14) = log(rsk(1))**2
XR(15) = log(rsk(2))**2
XR(16) = log(rsk(3))**2
XR(17) = log(rsk(4))**2
print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2
I get nonzero stuff printed.
More confusingly, if I code:
XR(14) = log(rsk(1))**2
print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
XR(15) = log(rsk(2))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
XR(16) = log(rsk(3))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
XR(17) = log(rsk(4))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2
print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2
Then I get zeros everywhere.
How is this happening?
Let me just remind that none of this happens on my machine using the Intel compiler and Intel Fortran 11.1.048. But that happens when I migrate to the UNIX cluster with Intel Fortran 11.1
Thanks,
Rafael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Rafael,
The problem was related to a specific optimization involving both the square of a math function and the rerolling of statements involving consecutive array elements to reconstitute a loop. The loop index was getting incremented twice, which led to the pattern noted above where the second element, XR(15), contained the result that should have been in the third element, XR(16). This will be fixed in a future compiler update.
There are, therefore, additional ways in which you could work around this, without reducing the optimization level. You could rewrite the four assignment statements as a loop; then, the compiler would not need to recreate a loop. Or, as you already noted, you could separate the calculation of the logarithms from the calculation of the squares. The former is probably the most elegant: using array notation,
XR(14:17) = log(rsk(1:4))**2
but you'd need to check that you don't have a similar construct anywhere else in your code.
In reply to your last question, an optimizing compiler is a very large and complex piece of software. Bugs are rare, but they do happen. The Intel compiler is run through a very extensive test suite, so any problems are usually only for a very specific set of circumstances, for example, involving the interplay between different optimizations, as here. When a problem is found, a corresponding test is added to the test suite, to ensure that similar problems don't recur in the future.
So whilst you shouldn't expect problems with the rest of your code, provided you check for recurrences of the exact same construct, it is good practice to compare results compiled with optimization against results when compiling without optimization, just as you would check results for a problem with a known solution when testing your own code. It becomes even more important to test and compare to a validated set of results once you begin writing parallel code.
Martyn
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ok, I was able to make it work now, but I would need some help to figure out what is going on.
Here is my call tofunction Emax_hat (the one where the problem was):
Emax(s) = Emax_hat(PI_COEF, rsk(:,kk), ExperTomorrow, s)
Here is how I declared the variables in Emax_hat:
============================================================
function Emax_hat(PI, rsk, exper, lag)
USE Global_Data
implicit none
integer , intent(in) :: exper(NSECTORS), lag
real(KIND=DOUBLE), intent(in) :: PI(NREG), rsk(NSECTORS)
integer i, s
real(KIND=DOUBLE) XR(NREG), log_Emax, Emax_hat
integer LagDummy(NSECTORS)
============================================================
NSECTORS andNREG (the sizes of many of the arrays above) are all global constants declared in module Global_Data.
I checked and in call:
Emax(s) = Emax_hat(PI_COEF, rsk(:,kk), ExperTomorrow, s)
I have the following declarations:
real(KIND=DOUBLE) PI_COEF(NREG)
real(KIND=DSOUBLE) rsk(NSECTORS,INTP)
integer ExperTomorrow(NSECTORS), lag
That is, all the types and sizes match. However, I was having the problem that I described extensively here.
I decided to try to declare thearguments of Emax_hat as assumed shape as follows:
============================================================
function Emax_hat(PI, rsk, exper, lag)
USE Global_Data
implicit none
integer , intent(in) :: exper(:), lag
real(KIND=DOUBLE), intent(in) :: PI(:), rsk(:)
integer i, s
real(KIND=DOUBLE) XR(NREG), log_Emax, Emax_hat
integer LagDummy(NSECTORS)
============================================================
And that worked.
Why is that?
So apparently the problem was indeed related to size declaration of arrays.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My guess is when you configure to compilewith errors, that one or moreof the callers assumes (or is told) the call interface passes descriptors as opposed to first cell in the array.
Try the options for geninterfaces and warn interfaces. This may point to the errant caller(s).
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My guess is when you configure to compilewith errors, that one or moreof the callers assumes (or is told) the call interface passes descriptors as opposed to first cell in the array.
Try the options for geninterfaces and warn interfaces. This may point to the errant caller(s).
Jim Dempsey
I have tried gen-interfaces and warn interfaces options. No error was pointed.
Steve, did you have the chance toexecute my code? Am I doing something wrong? If yes, why didn't the compiler - with all these options - detect the problem? Why have my code run smoothly on my Windows machine and not on the Linux system?
I feel very insecure going on without the answers to these questions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What about my comments above?Could you please take a look at them and let me know what you think?
Does it make sense that it worked with assumed-shape arrays?
Why didn't Jim's suggestions detect any error? (the options -warn intefaces and -gen-interfaces)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a general question that might solve all of this, without the need of going through my code.
Suppose I have a modulecontaining a subroutine and a function.
Here is what I am doing (in general terms):
module my_module
contains
subroutine my_subroutine(vec)
use global_var
real(8) vec(N)
call my_function
end subroutine
function my_function(vec)
use global_var
real(8) my_function
real(8), intent(in) ::vec(N)
... function commands ...
end function
end my_module
Note here that I am using another module, called global_var where constant N is defined.
module global_var
save
integer, parameter :: N = 25
end module global_var
My questions:
1) Is what I described here correct? If yes, then I would need you to look at the code, because I am having assignment errors, as I described. If it's not correct, why didn't the compiler detect any error?
2) What I have done is instead of having "real(8), intent(in) :: vec(N)" in my_function(vec) I have "real(8), intent(in) :: vec(:)", that is vec is declared as an assumed-shape array. That worked.
3) What I have also done is to pass the dimension of vec as an argument of my_function:
function my_function(vec,dim)
real(8) my_function
real(8), intent(in) :: vec(dim)
... function commands ...
end function
That has also worked.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If they are in the same module, then yes, that creates an explicit interface. My guess is that N is not the actual dimension of the array when passed in.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If they are in the same module, then yes, that creates an explicit interface. My guess is that N is not the actual dimension of the array when passed in.
I printed the dimension of the actual argument before calling the function and the dimension of the dummy argument, inside the function, and they match.
A new weird thing I discovered: if I compile with -debug full it works.
So it would be really helpful if you could go over the code and let me know what is going on. I understand your time constraints.
Please run the code using Linux.
Thank you,
Rafael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Note that you are posting in the forum for the windows flavour of the compiler.
Bit of speculation here - In the OpenMP 3.0 spec have a read of the Fortran specific bits in section 2.9.3.2 (data environment - shared clause), particularly the bit that starts "Under certain conditions...". There's also an elaborating example in appendix A29. Perhaps this is applicable to your code.
If so, you may have a race condition associated with the temporary copy of a variable that need to be made to mach the array section of a actual argument with an assumed size dummy. The fact that you get "array temporary" warnings is a pointer to this. Making the dummy assumed shape would avoid the need for the copy and hence avoid the race condition - which appears to be what you have found.
I'm not clear about how this potential race condition would result in the specific problem that you see, but obviously it would only apply if you had parallel execution. You claimed earlier that when you ran the code serially you still saw the problem. Are you really sure about that?
I had problems with an older version of the compiler (11.0?) when array dimensions were specified by parameters from a module and OpenMP was in use, but the symptoms were different and, if I recall correctly, it only applied to debug builds.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Note that you are posting in the forum for the windows flavour of the compiler.
Bit of speculation here - In the OpenMP 3.0 spec have a read of the Fortran specific bits in section 2.9.3.2 (data environment - shared clause), particularly the bit that starts "Under certain conditions...". There's also an elaborating example in appendix A29. Perhaps this is applicable to your code.
If so, you may have a race condition associated with the temporary copy of a variable that need to be made to mach the array section of a actual argument with an assumed size dummy. The fact that you get "array temporary" warnings is a pointer to this. Making the dummy assumed shape would avoid the need for the copy and hence avoid the race condition - which appears to be what you have found.
I'm not clear about how this potential race condition would result in the specific problem that you see, but obviously it would only apply if you had parallel execution. You claimed earlier that when you ran the code serially you still saw the problem. Are you really sure about that?
I had problems with an older version of the compiler (11.0?) when array dimensions were specified by parameters from a module and OpenMP was in use, but the symptoms were different and, if I recall correctly, it only applied to debug builds.
Hi Ian,
Thanks for your post.
I removed all the parallelization from the code in order to isolate the problem.So, the problem I described is not due to parallelization issues.
I read somewhere on this forum that"array temporary" warnings could be caused by non-contiguous arrays sections. Once I pass a contiguous array to the function I am having the problem, the warning disappears, but the problem persists.
Yes, I noticed I am in the Windows section... Since I am hopeless now, I am considering posting at the Linux section too.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
real(kind=DOUBLE), pointer :: arry(:)
...
...
arry => rsk(:,kk)
Emax(s) = Emax_hat(PI_COEF, arry, ExperTomorrow, s)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
real(kind=DOUBLE), pointer :: arry(:)
...
...
arry => rsk(:,kk)
Emax(s) = Emax_hat(PI_COEF, arry, ExperTomorrow, s)
The code I posted already solved the "array temporary thing"... The first argument was (in another version) a non contiguous array, but in this version of the code PI_COEF is contiguous. With this fix, the warning disappeared, but the original problem persisted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The code I posted already solved the "array temporary thing"... The first argument was (in another version) a non contiguous array, but in this version of the code PI_COEF is contiguous. With this fix, the warning disappeared, but the original problem persisted.
Hi again,
I was able to make the program work using the debug option. What other options does the debug option activate that might be explaining this difference in behavior?
Here is my compile file (without debug):
ifort Global_Data.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread -heap-arrays -warn interfaces -gen-interfaces
Here is my compile file with debug (only difference is the -debug option in the end):
ifort Global_Data.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread -heap-arrays -warn interfaces -gen-interfaces -debug
Just FYI, I have extensively checked consistency of all arguments (types and dimensions)...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, setting "debug" turns off optimizations.
You could try your original compile line with -O0
You could try the -debug compile line with -O2
That might give interesting results.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Well, setting "debug" turns off optimizations.
You could try your original compile line with -O0
You could try the -debug compile line with -O2
That might give interesting results.
The original compile with -O0 makes the code work.
With -debug together with -O2 it doesn't.
Still looking for what could be causing the problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The original compile with -O0 makes the code work.
With -debug together with -O2 it doesn't.
Still looking for what could be causing the problem.
Compiling like that:
ifort Global_Data.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread -heap-arrays -warn interfaces -gen-interface
Produces the error I have been talking about.
However, adding -check all or just -check bounds "removes the error"
Not sure how -check interacts with optimization (if there is any interaction... didn't see it in the manual) but I added -check all -O2 just for peace of mind. The code still works.
So summarizing: the code works with -O0 (that's what I said in the previous post) and does not with -O2. However, the code WORKS with -O2 and -check bounds and/or -check all.
Still working on it... Still lost!
Ah! Almost forgot...
If I compile with "-diag enable sc" I have the following errors:
LinReg_MOD.f90(58): error #12171: dereference of NULL pointer "Y_HAT" set at (file:LinReg_MOD.f90 line:5)
LinReg_MOD.f90(68): error #12171: dereference of NULL pointer "DISP" set at (file:LinReg_MOD.f90 line:5)
However, I am not sure how to interpret that since both Y_hat and disp are local arrays with dimension given by one of the arguments of the function they belong to. The lines of the errors correspond to the initialization of Y_hat and disp.
I THINK I read somewhere that -diag-enable sc produces garbage sometimes... Just wondering whether I should pay attention to this or not...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Steve,
I just gave up looking for the bug in my program. I spent9 days now on it and was unable to find the problem. I am attaching the code I earlier sent you but with some print outs that will tell you what to look at, followed by a pause. I would greatly appreciate if you could look into it to see what is going on. Again, I understand your time constraints, so please do it at your convenience.
I am sorry I am a bit anxious, butthis is crucial for my PhD thesis work. I can't go on without it.
Basically, for each line, XR must be equal to log(rsk)**2. I am not having that for some unknown reasons.
To help you in the process here is a summary of stuff I learnt debugging it:
1) The code works fine if I compile using IVF version 11.1.048 together with Microsoft Visual Studio 2008 on my windows machine. I tried to change many of the options to make the code fail (optimization, check bounds, etc...), but the code ALWAYS worked on my Windows machine.
2) Although one of the files is called Parallel_Emax_MOD.f90, there is nothing parallel in it. I removed all parallelizations in order to focus on the origin of the problem.
3) The code fails when I compile and execute it on a Linux machine. Here is my command line:
ifort Global_Data.f90 LinReg_MOD.f90 Parallel_Emax_MOD.f90 Main.f90 -o estimation -L$LIBRARY_PATH -I$INCLUDE -Wl,--start-group -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -Wl,--end-group -liomp5 -lpthread -heap-arrays -warn interfaces -gen-interface
4) The code works on Linux if I add -O0
5) The code does not work on Linux if I add -debug -O2
6) The code works if I add ONLY -check bounds or just -check to the line in 3)
UPDATE:
7) This is quite random, but if insert a couple of "print*" around the code, especially after the assignments that go wrong, that is:
XR(14) = log(rsk(1))**2
print*
XR(15) = log(rsk(2))**2
print*
XR(16) = log(rsk(3))**2
print*
XR(17) = log(rsk(4))**2
print*
Thecode works... Could I be facing a compiler bug? This behaviorsounds very random to me...
The trouble is I need to run it on the Linux machine since I will need to perform some heavy parallelization.
Many, many thanks for all the help.
Rafael
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I investigated your problem at Steve's suggestion, and it does indeed appear to be a compiler optimization bug on Linux only. I shall pass a small reproducer along to the compiler developers to investigate further.
In the meantime, the simplest, safest way for you to proceed would be to insert a compiler directive
!DIR$ NOOPTIMIZE immediately after the FUNCTION EMAX_HAT statement. That will prevent this function from being optimized, but other functions within the file will still get optimized.
We'll let you know if we have further news or advice.
Martyn

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page