Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

vectorization of operations involving frexp ldexp modf etc.

Terry
Beginner
589 Views

How would I vectorise (say for avx2) a doubly indexed loop containing the following code:

double f (double lhs, double  rhs)
{
int index;
std::frexp(rhs, index);
auto twopwr = std::ldexp(double(.5), index);
return (lhs * twopwr + (rhs - twopwr);
}

So optimise/vectorise the following:

for (ptrdiff  i= 0; i < end(X) - begin(X) ; ++i)
   for (ptrdiff  j= 0; j < end(Y) - begin(Y) ; ++j)
      ANS.emplace_back( f(*(begin(X)+i), *(begin(Y)+j)) );

Where ANS X and Y are appropriately aligned vectors of doubles. One may reorder the loops. The order in ANS is not important here and can be dealt with elsewhere in the code. Actually f should be template and I need code that works work for floats, doubles, extended doubles, ...

The values lhs and rhs and f(lhs, rhs) are constrained and will always be fully represented positive integer doubles in the sense they are strictly positive and that index is always less than 53 and the integer part of lhs or rhs always equals lhs, rhs respectively. One could size ANS so that there were no memory allocations during the loops.

Suggestions appreciated?

0 Kudos
2 Replies
TimP
Honored Contributor III
589 Views
You might check that your svml library includes the corresponding frexp and frexp. If the compiler has difficulty seeing invariant loop count, you could set the count as a local. There is no vectorization for extended double. You would likely need ivdep or simd pragma.
0 Kudos
Terry
Beginner
589 Views

Tim P

Thanks for this. I have not found a link to any intel documentation that suggests the compiler will auto vectorise a loop with ldexp or frexp in it. Nor can I find any intrinsics that will do it; but I am not expert on what is available. boost::simd:: seems the main place for what I need but it would introduce a big dependency.

My guess is that it should be OK with the begin/end vectors so long as the construction of those vectors is local but I agree that it is something to remove, and then check if the process fails to vectorise when one has sorted the frexp etc.

I meant 128 bit doubles not extended doubles - my lack of clarity;.and yes, I would need simd pragma, and to template the function or embed it so it is inline and unrolls. Still the challenge seems to remain. .    

0 Kudos
Reply