stability of ia32 versus intel64 code

Greynolds__Alan · ‎08-31-2010

I just upgraded to Windows-7 64-bit so I recompiled my applications with 11.1.067 and intel64 (had no problems under ia32). One application failed to link because then name of one of my routines appeared to conflict with one in a Microsoft library. Another was able to work properly with a single 12GB array but crashed on simpler calculations. In both cases, everything works fine with ia32 or a debug/non-optimized compile.

Before I report these problems, I'd be curious to know if others have seen problems with compiling for intel64 but not ia32 on the same code.

Al Greynolds

www.ruda-cardinal.com

jimdempseyatthecove · ‎08-31-2010

I've had no problems in portability between 32-bit and 64-bit (Intel64), however I do not use many 3rd party libraries nor QuickWin.

If your code works in x64/Debug but not in x64/Release then this sounds like an optimiztion problem
*** However ***
Debug build preinitializes data where Release does not (this would be a programming problem of yours)
Build a Debug version with runtime checks for use of uninitialized variables (as well as array bounds and argument/interfaces). If any errors show up then fix those.

Jim Dempsey

Greynolds__Alan · ‎08-31-2010

BTW, I have had no such problems using intel64 with 11.1.088 on MacOSX SnowLeopard.

Al

mecej4 · ‎08-31-2010

I routinely test codes on IA32 and Intel64 and, occasionally, on IA64. On the few cases where a code worked on one target but not the other, there was an implicit assumption of default size integer (or, when C source was used, pointer) that did not match the size for the compiler/target CPU combination.

Once these errors were corrected, there would be hardly any difference in the results, ruling out differences caused by excess precision if X87 floating point had been used.

The same kind of mismatches occurred when the PC world went from 16-bit CPUs to 32-bit CPUs.

Greynolds__Alan · ‎09-01-2010

UPDATE:

1. The name conflict is definitely a bug which I was able to report with a small test case to the Premier site. My workaround was to change the routine name slightly.
2. The crash is definitely an optimizer bug (not with ia32 or on MacOSX) butI have not been able to reproduce it with anything smaller than my full application. It appears that when IPO inlined one of my smaller routinesan array allocation ended occuring after its first access. The insertion of one "extra" line of source made it go away.

Al

jimdempseyatthecove · ‎09-01-2010

Quoting AlGreynolds

It appears that when IPO inlined one of my smaller routinesan array allocation ended occuring after its first access.

Al,

Could you rephrase the statement?

There is one other note that may be of interest.

A few years ago I had a case of "Gremlens" in an application. This somewhat looked like allocation errors and/or other mysterious errors. I was all set to blame the compiler optimizations. Because, like you, Debug build worked fine, Release build had errors. Furthermore, adding diagnostic statements caused the point of error to move in the code. (Does this sound familliar?).

To further complicate the issue, I could set up a reproducer that reliably failed and I could step through the code up to the error. When stepping through the error point the program would GP fault. The source code as viewed in the debugger looked correct (and was correct), all the referenced variables looked correct too. This left me to (mis)conclude that some errant reference blew a hole in the code. Looking at the dissassembly, the code was correct too! So what was wrong? Every thing looked OK and a GP fault could not be explained. Next thought was "Was the processor or memory flaking out?". This was not the case either. What I did to corner the Gremlen was to write a diagnostic helper routine to detect code modification. This was tricky to write because the modification occured before the program started and dissappeared at break, came back at contine, etc... Of course, inserting the diagnostic routine moved the problem to somewhere else. To my relief, the diagnostic helper code detected modification of the source (binary) code. After seeing what the code change was, it appeared to be a break point was being placed inside an instruction but not at the beginning of the instruction. The intended register/argument address was being modified but not the 1st byte(s) of the instruction. (IOW the break would not occure)Looking at the break points, there were no break points set in the area ???

What I did do next, "fixed" the problem.

Debug | Break Points | Delete all break points

Apparently there was a bug in VS that presented itself as if it were an invisible break point.
You might give this "fix" a try after all else fails.

Jim Dempsey

mecej4 · ‎09-02-2010

Al: It is possible that you have run into a Heisenbug related to the optimizer, which may become active with or without IPO. Such compiler bugs are very tough to narrow down and report with a small enough example that the compiler vendor could work on. In fact, the very act of reducing the number of lines of source code can make the bug disappear. That is partly the reason for the name 'Heisenbug'.

A few months ago, I found one such bug. It took quite a bit of effort to make up a small example. However, the effort was worthwhile because the bug had existed in Versions 8 to 11.1.065 of the Intel compiler. Once Intel had the small example, they fixed the optimizer bug very promptly, and the recent Update-7 does not have this bug anymore.

I urge you to do us all a favor and send in a small "reproducer". Here is what I did to reduce the size of the example code (you may already have done something similar, or may have a better approach):

1. Identify the subroutine where the optimized code produced incorrect results. In my case, this happened in a 1400 line subroutine of a 34,000 line program. It is by no means trivial to put one's finger on the culprit, since the point of detection and the point of initiation of the error can be quite far off.

2. Write a small driver program to call the subroutine identified in Step. 1 with the same arguments (and variables in modules and in COMMON, if any) as in the full application. This brought down the size of the "reproducer" to 1500 lines.

3. Work with the reproducer, and whittle away as many lines as possible without making the bug go away. Many iterations are needed, to be sure, but the size is manageable now. This effort brought down the reproducer to 130 lines.

4. If, as you have observed, throwing in an extra line or a WRITE statement, or moving some lines of code around changes the results, noting those actions provides more useful information to the compiler developers.

Jim: About "It appears that when IPO inlined one of my smaller routinesan array allocation ended occuring after its first access.". I think Al meant that the optimizer/IPO had the effect of rearranging

[bash]ALLOCATE A(N)
A(2:3)=5.0
[/bash]

into

[bash]A(2:3) = 5.0
ALLOCATE(A(N))
[/bash]

And Jim, the war story that you related is quite interesting. However, you were debugging the debugger rather than the optimizer, and I still don't understand how a bug in the debugger could cause the standalone, optimized EXE to malfunction.

With my optimizer bug I found that it was next to impossible to debug optimized code at the Fortran level, since the debugger's idea of the current line is often not well-correlated with the instruction sequence that the optimizer produced. Even the reproducer was too big for me to debug at the assembly level, given the rather large number of local arrays and subroutine arguments.

jimdempseyatthecove · ‎09-02-2010

Quoting mecej4

And Jim, the war story that you related is quite interesting. However, you were debugging the debugger rather than the optimizer, and I still don't understand how a bug in the debugger could cause the standalone, optimized EXE to malfunction.

With my optimizer bug I found that it was next to impossible to debug optimized code at the Fortran level, since the debugger's idea of the current line is often not well-correlated with the instruction sequence that the optimizer produced. Even the reproducer was too big for me to debug at the assembly level, given the rather large number of local arrays and subroutine arguments.

The bug in the VS debugger caused a break point to be thrown into the execuitable image at an arbitrary (but fixed) location. This is similar to when you write via a bunged up or uninitialized pointer or reference. i.e. code gets cratered. When the cratering occures in a little or unused function, your test runs will not encounter the error (but your customers may after hours of run time or executing some functionality you did not exercize during testing). The peculiar quirk in my case, is when the error occured, and when I could trap back into the debugger, the debugger removed the break point and thus removed evidence of the source of theproblem. Up until I figgured out the problem, this looked identical to an optimization problem.

I agree, debugging optimized code is very difficult. And debugging IPO'd and optimized code is much harder yet.
A reproducer certainly helps the Intel development/support team to fix the problem. The time spent creating a reproducer is often recovered later when the same bug doesn't bite you later.

Jim

Wendy_Doerner__Intel · ‎09-02-2010

If you can not narrow down the test case, we can try working with a larger piece of code (it will take us longer to narrow down).

------

Wendy

Attaching or including files in a post

Yaqi_Wang · ‎09-03-2010

Just to tell my story. I met a similar thing before. The debug version works fine but not the optimized version (with IPO btw). I tried everying in the debugging mode to find the possible cause and failed.
By spending a big amount of effort with adding screen prints, I found the size of a pointer right after its association with another pointer is different from the size of the associated pointer. Adding some dummy lines before and after the association operation removes the error. So I used this remedy. As time goes on, I keep updating the problematic module. One day, I removed those dummy lines, and the code stillworks properly with the full optimization. The compiler version during this period does not change. So it could be some coding issue because I used pointers extensively or maybe this break-point bug caused the problem. This still stays in my mind. But I guess I will try to forget it and hope everything will be all right in future. Thanks for knowing there are some other possibilities although I do not know if I should trust them or not.

Yaqi

jimdempseyatthecove · ‎09-03-2010

Yaqi,

How are you declaring your pointers?

Jim Dempsey

Yaqi_Wang · ‎09-03-2010

I usually declare as,
integer, pointer :: ptr(:) => NULL()
(btw, I do not know why we need three status of a pointer. Two should be enough.)

The pointer could be inside a type definition.

I use the following features:
a => b
allocate(b(10))
and
a => b
c => a
nullify(a)
c = zero
by assuming fortran pointer behaves like c pointer.

jimdempseyatthecove · ‎09-03-2010

c = zero is not the same as nullify(c) when zero is

integer, pointer :: zero => NULL()

With c being a pointer to integer and currently pointing to valid integer and with zero declared and used above

c = zero

means use variable zero as a reference to an integer, dereference it (boink), with the intention to copy itinto location referenced by c.
(you do not get to the copy part of the above statement due to GP fault)

c = IntegerVariableOrLiteral

where prior state of c was nullify(c)

means use variable IntegerVariableOrLiteral as a reference to an integer, dereference it, and copy into location referenced by c (boink).
(the copy part of the above statement failsdue to GP fault)

jimdempseyatthecove · ‎09-03-2010

A different way to look at this is the Fortran statement

c = zero

is equivilent to the C/C++ statement

*c = *zero;

and

Fortran:
subroutine foo(i,j,k)
integer :: i,j,k

is equivilent to C/C++

void foo(int* i, int* j, int* k)
.or.
void foo(int& i, int& j, int& k) // in C++

Jim Dempsey