Tim P

Terry · ‎11-04-2017

How would I vectorise (say for avx2) a doubly indexed loop containing the following code:

double f (double lhs, double  rhs)
{
int index;
std::frexp(rhs, index);
auto twopwr = std::ldexp(double(.5), index);
return (lhs * twopwr + (rhs - twopwr);
}

So optimise/vectorise the following:

for (ptrdiff  i= 0; i < end(X) - begin(X) ; ++i)
   for (ptrdiff  j= 0; j < end(Y) - begin(Y) ; ++j)
      ANS.emplace_back( f(*(begin(X)+i), *(begin(Y)+j)) );

Where ANS X and Y are appropriately aligned vectors of doubles. One may reorder the loops. The order in ANS is not important here and can be dealt with elsewhere in the code. Actually f should be template and I need code that works work for floats, doubles, extended doubles, ...

The values lhs and rhs and f(lhs, rhs) are constrained and will always be fully represented positive integer doubles in the sense they are strictly positive and that index is always less than 53 and the integer part of lhs or rhs always equals lhs, rhs respectively. One could size ANS so that there were no memory allocations during the loops.

Suggestions appreciated?

TimP · ‎11-04-2017

You might check that your svml library includes the corresponding frexp and frexp. If the compiler has difficulty seeing invariant loop count, you could set the count as a local. There is no vectorization for extended double. You would likely need ivdep or simd pragma.

Terry · ‎11-04-2017

Tim P

Thanks for this. I have not found a link to any intel documentation that suggests the compiler will auto vectorise a loop with ldexp or frexp in it. Nor can I find any intrinsics that will do it; but I am not expert on what is available. boost::simd:: seems the main place for what I need but it would introduce a big dependency.

My guess is that it should be OK with the begin/end vectors so long as the construction of those vectors is local but I agree that it is something to remove, and then check if the process fails to vectorise when one has sorted the frexp etc.

I meant 128 bit doubles not extended doubles - my lack of clarity;.and yes, I would need simd pragma, and to template the function or embed it so it is inline and unrolls. Still the challenge seems to remain. .

vectorization of operations involving frexp ldexp modf etc.