AVX Optimization Changes Results?

Nick2 · ‎03-06-2012

I recently noticed two thingsusing /QaxAVX and ifort 2011.4.196. One is that my run speedsignificantly improved on the Core i7-2600 CPU, great job there! The other is that I get digit for digit identical results on everything from a Xeon Pentium III to a Xeon X5690, 32-bit or 64-bit architecture,but the i7-2600 breaks the digit for digit-ness.

It's the kind of thing where you first see

3568462.66381760
vs
3568462.66381759

and then it eventually grows.

This kind of thing is not too easy to explain to customers running V&V on my software. My question is, is this a feature or a bug? What are any specific major changes in the AVX architecture that would cause this? Thanks!

Steven_L_Intel1 · ‎03-06-2012

Vectorization often creates small differences due to different order of operations. AVX instructions can vectorize more operations than SSE2 can. The way you have built the program, you get AVX instructions only on "Sandy Bridge" and newer CPUs, and SSE2 on everything else. (Pentium III? Seems unlikely to me - SSE2 code won't run there.)

This is a fact of life with comnputational floating point.

mecej4 · ‎03-06-2012

> This kind of thing is not too easy to explain to customers running
> V&V on my software. My question is, is this a feature or a bug?

It is your decision as to whether it is a feature or a bug of your V & V procedure, given that floating point consistency is involved. Unreasonably tight consistency expectations raise the probability of disappointment.

CPUs are optimized more for speed than for consistency. Compilers may provide options to let you compile with more emphasis on consistency than the defaults give you, but if you use such options you should then be willing to sacrifice some speed.

Consistency by itself is no virtue, either. It is possible for an algorithm to give a consistent but incorrect result.