I used the -fma option to the compiler but it did not make any significant changes. Well, my cpu is an Intel Core 2 duo which is not that new, so I am not sure if the fma is supported on this cpu. However, it also did not slow down the program that much, at least timings are still fine. My complete options to compiler are
icpc -Wall -g -DNDEBUG -O3 -xhost -fp-model extended -prec-div -prec-sqrt -fp-speculation safe -fma -no-fp-port -mp1 -pc80
Well, I am not sure if I understood you completely, the algorithm given multiplies two numbers right? So are you proposing to use this algorithm to multiply each element of v by alpha?
Thanks for your interest, because I am really stuck for a long time, but at least now, I know the source of the problem is vector subtraction.
Some extra information for accuracy loss: I looked at my algorithm and some points where accuracy could be lost in addition to the above:
a.) I am scaling vectors, with reciprocals, such as using dscale in mkl, with '1.e0/scalar', can this be a problem, I am guessing that you would scale the vector with the above multiplication. For instance a C++ template is, mkl_scale( vec, 1.e0/scalar ) which calls dscale in mkl.
b.) I am using some dot products, and taking the square roots of the results of the dot products as the 'scalar' in (a).
Any ideas are appreciated, Thanks in advance.
Description: The fma functions return (x*y)+z.
double fma(double x, double y, double z);
long double fmal(long double x, long double y, long double z);
float fmaf(float x, float y, float double z);I should say that using this software fma() equivalent will certainly have negative performance impact since your system doesn't have HW support for FMA.