Optimising compiler problem

Intel_C_Intel · ‎01-15-2001

I have a crashing bug that only occurs in the release version of my code, and requires the optimisation level to be 'Optimise for speed' or greater. The symptom is that an array gets written beyond its end, but this is in turn caused by a block of code inside a loop that should reset a counter back to zero failing to execute.

Unfortunately, putting in diagnostic write statements mostly makes the symptom disappear.

My guess is that either there is a problem with the optimising compiler itself, or that somewhere my code is corrupting a byte or two of memory and that any changes I make to my code (e.g. putting in diagnostics) moves the code around and removes the symptoms.

The code itself is very big and I can't reproduce the problem in anything shorter. I can only give an outline here:

do i = 1, iNum_type_output
write(*, *) '***** diagnostic 1'

if () then
thisFunctionName = iRetrainReturnCode
return
end if
write(*, *) '***** diagnostic 2'

write(*, *) '***** diagnostic 3'
end do

Inserting diagnostic 2 does not alter the execution. If diagnostic 2 is the only write statement, then the output shows that the loop (or at least that write statement) does not execute. The array index does not get reset, and I get a crash as I do without any write statements. When I put in either diagnostic 1 or diagnostic 3, then the behaviour changes, the loop gets executed properly and the program doesn't crash.

From the function return value, I know that it never returns prematurely from the return statement in the loop (i.e. the doesn't occur). Confirming this, the code placed after the loop always gets executed. By examining the value of i, the loop counter, I can see that it is initialised to 1 by the do statement, but is not incremented. I also tried replacing the return statement above with a write statement. The write statement never executed, and the program appeared to run normally (without the other diagnostic write statements being necessary).

Putting in various write statements before the loop also makes the symptom disappear. In fact, almost any change to any of the code in the loop or before it makes the symptom disappear. Selecting array bounds checking makes the symptom disappear (but doesn't report any array bounds violations).

I am now at a complete loss. Every time I do something to try to get more diagnostic information, the symptom disappears. Can anyone suggest anything? In particular, how can I find out whether the problem is my code or the compiler optimisations? If it is my code, then reducing the level of compiler optimisations is probably just moving the error to somewhere where it is less obvious.

All suggestions gratefully received.

Ian

Steven_L_Intel1 · ‎01-15-2001

I suggest you send a ZIP file of your project, including all sources, data, and instructions for reproducing the problem, to us at vf-support@compaq.com. We're pretty good at figuring these things out. We'll also try it with 6.5 and tell you if the problem goes away with that version, or if there's a bug in your code.

MOST of the time, problems like this are caused by subtle coding errors that are revealed when optimization makes certain assumptions that the standard's rules are being followed. Try, as an experiment, checking the optimization box "Enable dummy arguments to share memory locations". This disables an optimization that assumes you haven't violated the rule against argument aliasing.

Steve

Intel_C_Intel · ‎01-16-2001

I've managed to pare the code down to the just 10 source files for the main dll, plus the calling .exe file and still show up the problem, so I'll send you the files as suggested.

(Yes, there is some non-standard stuff in there, but that's because it uses the Windows API for threads and stuff and uses Cray pointers for compatibility with an alternative C harness program.)

Thanks.

Ian

Intel_C_Intel · ‎01-30-2001

Well, just to close this thread, Steve Healey at Compaq had a look at this one for me, and we found in the end that the problem was only apparent in version 6.1A, not in version 6.1, or 6.5.

We came to the conclusion that is is probably the compiler then and not my code, although it is not proven...

As a work-around, I changed the declaration of one of my global arrays from pointer to allocatable, and this (for whatever reason) made the problem go away.

Ian