Maybe I wasn't clear - I've

YertleTheTurtle · ‎11-26-2012

Hi: I have a program that doesn't work when you add an extraneous write statement. I've turned on every debugging test I can find, and no errors are reported. If the optimizer for one subroutine is disabled and the write statement is removed, the program still doesn't work. With optimizer set to "Maximum Speed" the program works, but only if the write is removed. This is the exact reverse of what you'd expect from a compiler optimization bug. But in my experience, when a program changes with to the addition of extraneous statements, it signifies a compiler bug.. Question: how do I contact a human at Intel to discuss this, and do they really want to know, since I am running Intel Fortran "Explorer" version 11.067 and my subscription has expired?

netphilou31 · ‎11-26-2012

Hi, It looks like you have a memory overflow somewhere in the code. This behavior is typical of such a problem. You can even have a program that crashes but when debugging it the program works fine !. You didn't give the error message you have got, but I can recommend you to turn on (if not already done) some compiler switches like "check arrays and strings bounds". This can help you to find the source of the problem. Best regards, Phil.

Steven_L_Intel1 · ‎11-27-2012

You use this forum to report the problem. Please provide a test program that demonstrates the problem. If the problem is fixed in a newer version, we'll tell you, and we'll also let you know if there's a workaround. While you don't have access to Intel Premier Support, with its 1:1 support and guaranteed response, or to product updates, we're still interested in any and all reported problems. Sometimes a bug found in an old version is still present in a newer one, and it needs to get fixed. However, I tend to agree here with Phil in that the symptom you describe is more likely to be an error in your program that is sensitive to data layout.

YertleTheTurtle · ‎11-27-2012

Maybe I wasn't clear - I've run with all debug flags turned on and there is no error message. Here is a portion of the command line /nologo /debug:full /QaxSSE2 /Qparallel /fpp ...... /Qopenmp /warn:noalignments /real_size:64 /Qtrapuv /Qfp-stack-check /module:"Debug\\" /object:"Debug\\" /traceback /check:pointer /check:bounds /check:uninit /libs:static /threads /dbglibs /Qmkl:parallel /c When the program "doesn't work" what happens is that it gives the wrong answerbecause there are incorrect values in an array when the program returns from a subroutine. Although I am using OpenMp, the problem is occurring well before any OpenMp calls are executed. It would be nice to generate a small test program, but with 150 subroutines and several tens of thousands of lines of code, that is easier said than done. Can you suggest a debug flag I've missed? If I send you a copy of the offending routine is that any use? Making an executable test program is a near impossibility.

Steven_L_Intel1 · ‎11-27-2012

Given what you say, I would start adding WRITE statements that dump the values in the array at various points in the calculations. Since you say you can reproduce the problem with WRITEs, this may work for you. One more option I can suggest is /warn:interface But since this is an OpenMP program, you may have some thread-safety issues. Why not download a trial copy of Intel Parallel Studio XE and run this through Intel Inspector XE with its thread correctness checks. You might also try a build with Static Analysis enabled to see if it catches any errors.

YertleTheTurtle · ‎11-27-2012

Hi: After spending 5 days trying to track down this problem, I tried to take your advice and create a test program by writing all the input variables of the offending subroutine to a file. Then I was going to isolate the subroutine in a separate program and read in all the variables before calling the subroutine, hopefully to reproduce the problem. But, in doing so, I found what is wrong. That raises another question. The structure of the part of the error-prone program reads as follows: subroutine A(....) real(kind=kind_dp) :: ZeroD . . . call subroutine B(..., ZeroD,...) . . . subroutine B(...,V1,...) real(kind=kind_dp), intent(in) :: V1 x=... a=V1*x + ... end subroutine B end subroutine A As soon as I inserted a "write" statement prior to the call to Subroutine B, I got a run-time error telling me that the variable "ZeroD" was undefined. Of course, the second statement should have read real(kind=kind_dp) :: ZeroD=0._kind_dp and that explains my problem - the variable ZeroD was picking up garbage and passing it into subroutine B and t he write statements were affecting what garbage was being picked up. But why didn't the debug pick that up? Did I miss a debug flag? I had check/uninit set as well as /Qtrapuv which is supposed to find this kind of error. Why did it only give me an error when I tried to use the uninitialized variable inside Subroutine A, and not find an error when it was passed to, and used, inside subroutine B? Do I owe you an apology for calling it a compiler bug?

Steven_L_Intel1 · ‎11-27-2012

The regular uninitialized variable checking would not detect this because it doesn't know that ZeroD isn't given a value in subroutine B. This is the kind of thing that Static Analysis, a feature of our "Studio" suites. can catch. /Qtrapuv does not do anything useful - please pretend it doesn't exist. As for an apology - none is needed, unless it's for the implication in the first post that no humans at Intel read this forum! You've been here long enough to know that's not the case.

YertleTheTurtle · ‎11-27-2012

Hello Steve: In my defense: 1. I tried to send the issue to a hypothetical human at a theoretical bug reporting centre, but couldn't find one - my thought being that the forum wouldn't really be interested in this. That is why it ended up on the forum since, as you say, I knew someone would read it. 2. I grew up alongside a CDC 6600 and the compiler that came with that machine, with its "uninitialized floating point variable" traps would have caught this kind of error (at runtime) on day 1. Has all the old technology been lost? Does one really have to buy a new (Studio) compiler to get back to where the world was 40 years ago?

Steven_L_Intel1 · ‎11-28-2012

The forum is where we want you to report issues if you don't want to (or don't have access to) Intel Premier Support. Many times it's beneficial to have the problem description seen by the community - other users can offer advice and experience, and the solution, once found, is available for others to read. We do have a "bug reporting centre" - Intel Premier Support - but you need to have a current support license to use it. That's one of the benefits of buying ongoing support (the other is having access to product updates.) /Qtrapuv, theoretically, is what you want. The original specification was that it would store a signaling NaN. But when it was implemented, the value it stores is not one that triggers an exception - it's just an unusual but normal floating point value. This is a bit of a thorn in my side - I have requested that either the implementation be changed to do what it was supposed to do, or be removed, as it continues to trip up users such as yourself. I will point out that even if it did work, it would only solve part of the general problem, since non-REAL values have no corresponding NaN that can be used. Static Analysis is a very useful feature that goes far beyond what "uninitialized floating point variable" can do, though it is not a complete solution by any means. It does find more kinds of errors than you saw in earlier compilers. Let me show you a quick example. I tried the following program: [fortran] call sub end subroutine sub real xx call sub2(xx) print *, xx end subroutine sub2 (x) real, intent(in) :: x print *, x end [/fortran] There are two opportunities to catch an error here - one is the use of the uninitialized x in "sub" after the call, and one inside sub2. Static Analysis finds them both:

(The line numbers don't match because I took out some irrelevant lines when pasting above). Note that for the reference inside sub2, you get the call chain. I have seen this catch uninitialized integers four levels up a call chain - try that with something like /Qtrapuv!

Bug reporting?