Solved: Re: Question - Page 3

rafadix08 · ‎12-09-2009

I compiled my code on my intel fortran compiler 11.1.048.
The code just ran fine.

When I compiled and ran the code in a UNIX cluster, with the Intel Fortran 11.1 I have something really weird going on.

First, the program was crashing at some point... Debugging it I found this out:

XR(14) = log(rsk(1))**2
print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
XR(15) = log(rsk(2))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
XR(16) = log(rsk(3))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
XR(17) = log(rsk(4))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

this prints a bunch of zeros on the screen, which should be the case.

However if I do this:

XR(14) = log(rsk(1))**2
XR(15) = log(rsk(2))**2
XR(16) = log(rsk(3))**2
XR(17) = log(rsk(4))**2

print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

I get nonzero stuff printed.

More confusingly, if I code:

XR(14) = log(rsk(1))**2
print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
XR(15) = log(rsk(2))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
XR(16) = log(rsk(3))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
XR(17) = log(rsk(4))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

print*, 'XR(14) - log(rsk(1))**2 =', XR(14) - log(rsk(1))**2
print*, 'XR(15) - log(rsk(2))**2 =', XR(15) - log(rsk(2))**2
print*, 'XR(16) - log(rsk(3))**2 =', XR(16) - log(rsk(3))**2
print*, 'XR(17) - log(rsk(4))**2 =', XR(17) - log(rsk(4))**2

Then I get zeros everywhere.

How is this happening?

Let me just remind that none of this happens on my machine using the Intel compiler and Intel Fortran 11.1.048. But that happens when I migrate to the UNIX cluster with Intel Fortran 11.1

Thanks,
Rafael

Martyn_C_Intel · ‎12-17-2009

Rafael,
The problem was related to a specific optimization involving both the square of a math function and the rerolling of statements involving consecutive array elements to reconstitute a loop. The loop index was getting incremented twice, which led to the pattern noted above where the second element, XR(15), contained the result that should have been in the third element, XR(16). This will be fixed in a future compiler update.
There are, therefore, additional ways in which you could work around this, without reducing the optimization level. You could rewrite the four assignment statements as a loop; then, the compiler would not need to recreate a loop. Or, as you already noted, you could separate the calculation of the logarithms from the calculation of the squares. The former is probably the most elegant: using array notation,
XR(14:17) = log(rsk(1:4))**2
but you'd need to check that you don't have a similar construct anywhere else in your code.

In reply to your last question, an optimizing compiler is a very large and complex piece of software. Bugs are rare, but they do happen. The Intel compiler is run through a very extensive test suite, so any problems are usually only for a very specific set of circumstances, for example, involving the interplay between different optimizations, as here. When a problem is found, a corresponding test is added to the test suite, to ensure that similar problems don't recur in the future.
So whilst you shouldn't expect problems with the rest of your code, provided you check for recurrences of the exact same construct, it is good practice to compare results compiled with optimization against results when compiling without optimization, just as you would check results for a problem with a known solution when testing your own code. It becomes even more important to test and compare to a validated set of results once you begin writing parallel code.

Martyn

View solution in original post

rafadix08 · ‎12-16-2009

Quoting - Martyn Corden (Intel)

Hi Rafael,
I investigated your problem at Steve's suggestion, and it does indeed appear to be a compiler optimization bug on Linux only. I shall pass a small reproducer along to the compiler developers to investigate further.
In the meantime, the simplest, safest way for you to proceed would be to insert a compiler directive
!DIR$ NOOPTIMIZE immediately after the FUNCTION EMAX_HAT statement. That will prevent this function from being optimized, but other functions within the file will still get optimized.

We'll let you know if we have further news or advice.

Martyn

Martyn,
Thank you very much for looking into it. Ifeel relieved that it was not a programming error.
Is there any chance I can be notified of the result of the investigation and/or intel update (if any)?
Thank you also to Steve and to the other members who tried to help. Really appreciate it.
Rafael

rafadix08 · ‎12-16-2009

Quoting - rafadix08

Martyn,
Thank you very much for looking into it. Ifeel relieved that it was not a programming error.
Is there any chance I can be notified of the result of the investigation and/or intel update (if any)?
Thank you also to Steve and to the other members who tried to help. Really appreciate it.
Rafael

Last question: if there is a bug in the optimizer, why should I disable the optimizer only for Emax_hat? How can I be sure that the rest of the code is being correctly compiled?

rafadix08 · ‎12-17-2009

Dear Members of the Intel team,

I tried Martyn's suggestion of including !DIR$ NOOPTIMIZE after the Emax_hat function, which was where I was having trouble. The use of this directive appears to be avoiding the problem I was having, however I use the Emax_hat function VERYintensively and the speed of my code decreased by a factor of 4, which makes my program almost infeasible (it was already taking too long). Please notethat the code I sent for analysis was one where I reduced my code to the minimum in order to illustrate my problem.

I would like to ask some questions about this compiler optimization bug, since the answers for these will help medecide whether I have to switch compilers or not.

1) If there is indeed a compiler optimization bug, why should I disable it only for Emax_hat, which is anEXTREMELY SIMPLEfunction? How can I trust the compiler optimization is working properly for the other routines?

2) How longshould I expect this bug to be corrected and a new version of the Intel compiler to be issued?

3) Are there other ways to improve performance of Emax_hat?

Also, please let me know if you happen to have any suggestions.

Thank you,
Rafael

Steven_L_Intel1 · ‎12-17-2009

Optimizer bugs are usually very specific to particular code and not something general. If it was general, we'd see it in the extensive testing we do and we'd hear about it from many customers. I would take Martyn's advice here.

It is too soon to know when the bug will be fixed. The next opportunity will be the mid-late January update.

Martyn_C_Intel · ‎12-17-2009

Rafael,
The problem was related to a specific optimization involving both the square of a math function and the rerolling of statements involving consecutive array elements to reconstitute a loop. The loop index was getting incremented twice, which led to the pattern noted above where the second element, XR(15), contained the result that should have been in the third element, XR(16). This will be fixed in a future compiler update.
There are, therefore, additional ways in which you could work around this, without reducing the optimization level. You could rewrite the four assignment statements as a loop; then, the compiler would not need to recreate a loop. Or, as you already noted, you could separate the calculation of the logarithms from the calculation of the squares. The former is probably the most elegant: using array notation,
XR(14:17) = log(rsk(1:4))**2
but you'd need to check that you don't have a similar construct anywhere else in your code.

In reply to your last question, an optimizing compiler is a very large and complex piece of software. Bugs are rare, but they do happen. The Intel compiler is run through a very extensive test suite, so any problems are usually only for a very specific set of circumstances, for example, involving the interplay between different optimizations, as here. When a problem is found, a corresponding test is added to the test suite, to ensure that similar problems don't recur in the future.
So whilst you shouldn't expect problems with the rest of your code, provided you check for recurrences of the exact same construct, it is good practice to compare results compiled with optimization against results when compiling without optimization, just as you would check results for a problem with a known solution when testing your own code. It becomes even more important to test and compare to a validated set of results once you begin writing parallel code.

Martyn

rafadix08 · ‎12-17-2009

Quoting - Martyn Corden (Intel)

Rafael,
The problem was related to a specific optimization involving both the square of a math function and the rerolling of statements involving consecutive array elements to reconstitute a loop. The loop index was getting incremented twice, which led to the pattern noted above where the second element, XR(15), contained the result that should have been in the third element, XR(16). This will be fixed in a future compiler update.
There are, therefore, additional ways in which you could work around this, without reducing the optimization level. You could rewrite the four assignment statements as a loop; then, the compiler would not need to recreate a loop. Or, as you already noted, you could separate the calculation of the logarithms from the calculation of the squares. The former is probably the most elegant: using array notation,
XR(14:17) = log(rsk(1:4))**2
but you'd need to check that you don't have a similar construct anywhere else in your code.

In reply to your last question, an optimizing compiler is a very large and complex piece of software. Bugs are rare, but they do happen. The Intel compiler is run through a very extensive test suite, so any problems are usually only for a very specific set of circumstances, for example, involving the interplay between different optimizations, as here. When a problem is found, a corresponding test is added to the test suite, to ensure that similar problems don't recur in the future.
So whilst you shouldn't expect problems with the rest of your code, provided you check for recurrences of the exact same construct, it is good practice to compare results compiled with optimization against results when compiling without optimization, just as you would check results for a problem with a known solution when testing your own code. It becomes even more important to test and compare to a validated set of results once you begin writing parallel code.

Martyn

Martyn,

Many thanks for the detailed reply.

I will try to follow your suggestion of working around this without reducing the optimization level. I really can't afford this deterioration in performance.

However, I still have some very specific questionsrelated tothis issue. Please let me know if the best way is to keep exchanging messages via the forum, by email or via private messages. Is there a way to exchange private messages with the Intel group?

Many thanks again,
Rafael

Steven_L_Intel1 · ‎12-17-2009

Rafael,

As a free, non-commercial license customer, these forums are what are available to you. Customers who purchase a license with support have access to Intel Premier Support.

rafadix08 · ‎12-18-2009

Quoting - Steve Lionel (Intel)

Rafael,

As a free, non-commercial license customer, these forums are what are available to you. Customers who purchase a license with support have access to Intel Premier Support.

Steve,

I actually have access to intel premier support for my IVF w/ IMSL for Windows. I am registered as rafadix (not rafadix08). It's just that the process to submit issues seems a bit complicated, so I prefered to use the forum.

Anyway, I would be grateful if Martyn could look at the following question:

I am a bit confused since in module Parallel_Emax_MOD.f90 there is a subroutine called Emax_Coef with assignments very similar to the ones inside Emax_hat:
X(kk,14) = (log(rsk(1))**2
X(kk,15) = (log(rsk(2))**2
X(kk,16) = (log(rsk(3))**2
X(kk,17) = (log(rsk(4))**2
However, no problem was detected in Emax_Coef... Why is that?

Also, there is additional thing that made me confused.
Remeber the assignments inside Emax_hat are:
XR(14) = (log(rsk(1))**2
XR(15) = (log(rsk(2))**2
XR(16) = (log(rsk(3))**2
XR(17) = (log(rsk(4))**2

That's when the error was detected. However, if I compile like that:
XR(14) = (log(rsk(1))**2
print*
XR(15) = (log(rsk(2))**2
print*
XR(16) = (log(rsk(3))**2
print*
XR(17) = (log(rsk(4))**2
print*

The code works... The assignments are correctly done... This is curious

Anyway, I did follow your suggestion and wrote something like that (avoiding the composition of log with **2):
XR(14:17) = log(rsk(1:4)) ; XR(14:17) = XR(14:17)**2
And that seems to be working - same results with and without optimization. So thank you very much for your help. I am just curious about why the above pieces of code worked. That may be useful for me in the future if I want to avoid this bug.

Many thanks,
Rafael

Martyn_C_Intel · ‎12-18-2009

Rafael,
The compiler was trying to reconstitute a loop, in order to optimize it.
In your example with the 2D array, it is the second subscript that is varying - the 4 elements of X are widely separated in memory. Optimization is most effective when the loop accesses contiguous memory locations. (That's why in Fortranyou should try to write loops over the first array index, and the inner loop of a nest should normally be over the first index). In this case, the compiler won't try to reconstitute a loop, since it's unlikely to be able to optimize it. Similarly, the print statements are effectively function calls, which would also prevent many loop optimizations.
It also depends on the context, but you might encounter the original problem if the 2D subscripts were in the opposite order: X(14,KK) =, x(15,KK) = ,etc., because these 2D array elements would still be adjacent in memory, so rerolling a loop might be worthwhile.

Martyn

Martyn_C_Intel · ‎04-13-2010

Rafael,
This problem has been fixed in update 5to the Intel compiler version 11.1: w_cprof_p_11.1.060 (Windows) or l_cprof_p_11.1.069 (Linux). You should be able to download these from the registration center.

Regards,
Martyn