Re: Vectorization error

bijilash_babu · ‎10-19-2005

Hi Guys,
Did any of you notice the error accumulation due to -xW
(vectorization) token. my application is giving error at even 3rd decimal place.also the -mp token increases the run time?
Cheers,

Message Edited by bijilash@imsc.res.in on 11-08-2005 11:06 PM

Intel_C_Intel · ‎10-20-2005

Dear Bijilash,

Numerical differences due to vectorization are not uncommon, especially when sum-reductions are vectorized by means of the partial-sums method (which relies on the mathematical associativity of operators that does not necessarily hold in finite-precision floating-point arithmetic). If the differences become too large, this may indicate that, for instance, your input data is not scaled properly (e.g. huge and tiny numbers are added together, which makes the accumulation very sensitive to the actual execution order). If you suspect a compiler bug causes the problem, however, please send me the test case at aart.bik@intel.com.

Aart Bik
http://www.aartbik.com/

Message Edited by abik on 10-20-2005 09:45 AM

bijilash_babu · ‎10-20-2005

Dear Dr. Bik,

Thanks for your comments, I did speak about this with
the compiler guys at Intel,Bangalore and during the
IDF session. All the guys there work with C or more advanced
ones. And they say the case cant be this worse. My query is about the possibility of getting some improvement with -mp like tag, of course I can sent a test case, will do it a day or two. from that you might
get an idea about the issue.

Thanks a lot for ur comments,

Message Edited by bijilash@imsc.res.in on 10-20-2005 10:04 AM

Intel_C_Intel · ‎10-20-2005

Dear Bijilash,

Resorting tomp is somewhat of a sledgehammer approach toavoiding the numerical differences. If vectorization of only one accumulation loop (or a few) causes the numerical differences, perhaps simply placing #pragma novector (in C) or !DIR$ NOVECTOR (in Fortran) before this culprit loop(s) will help you to get acceptable performance and numerical accuracy?

Aart

bijilash_babu · ‎10-20-2005

Hi Aart,

Thanks a lot, I will try putting !DIR$ NOVECTOR!

In fact to my surprise, with -mp the runetime increases?
Actually the Vectrorization gives a 15% improvement,anything
mor than 5% is valuable for us,will sent some test code to you soon.

Message Edited by bijilash@imsc.res.in on 10-22-2005 11:36 PM

Intel_C_Intel · ‎10-25-2005

Dear Bijilash,

Thanks for the test case, which was helpful. In this case, numerical differences are simply caused by going from O2 to xW, which changes the code generated by the compiler. In the former case, floating-point operations are performed on the x87 FPU (with 80-bit internal precision), whereas in the latter case, floating-point operations (both scalar and vector) are performed using SSE (with 32-bit precision for single-precision and 64-bit precision for double-precision). In fact, the results using pure scalar SSE (-xW, but vectorization disabled) and the mix of scalar and vector SSE (-xW as is) are identical.

However,for all switch settings, I only observed differences in the sixth decimal position (which seems reasonable), not in the third position as you reported. Can you give some specifics on the compiler version you are using?

Aart Bik
http://www.aartbik.com/

bijilash_babu · ‎10-25-2005

Hi Aart,

Thanks for your comments!

we use ifort 8.1

The accuracy will go down if the iterations( in this case the 3rd

parameter in the input)goes up, and reaches the 3rd decimal position.

Could you suggest some way to take advantage from the vectorization

(but, making everything double will create memory issues, The test

case I sent to you has a tiny lattice, which does not require much memory)

Thanks

Message Edited by bijilash@imsc.res.in on 10-25-2005 10:43 AM

TimP · ‎10-25-2005

It appears that your application requires extended precision for the sum reduction. You can accomplish this simply by declaring double precision for the sum variable, and possibly for intermediate operations, without having to promote any arrays. It is better to do so explicitly in the source, rather than relying on compiler options which switch to pre-1989 C mode.
I would expect you to have the same problems in C, if you asked a compiler to use pure float data operations, vectorized or not. The vectorized code usually produces somewhat better accuracy than scalar accumulation without extra precision, but not nearly as much accuracy as careful promotion to extra precision.
As you have not shown a source code example, I am speculating, but I expect mixed precision to inhibit SSE vectorization. If so, it may be interesting to try !dir$ vector always directive, to see if that will persuade the vectorizer, and whether that will improve performance over the -mp scalar version.

Vectorization and Accuracy