- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How would I vectorise (say for avx2) a doubly indexed loop containing the following code:
double f (double lhs, double rhs) { int index; std::frexp(rhs, index); auto twopwr = std::ldexp(double(.5), index); return (lhs * twopwr + (rhs - twopwr); }
So optimise/vectorise the following:
for (ptrdiff i= 0; i < end(X) - begin(X) ; ++i) for (ptrdiff j= 0; j < end(Y) - begin(Y) ; ++j) ANS.emplace_back( f(*(begin(X)+i), *(begin(Y)+j)) );
Where ANS X and Y are appropriately aligned vectors of doubles. One may reorder the loops. The order in ANS is not important here and can be dealt with elsewhere in the code. Actually f should be template and I need code that works work for floats, doubles, extended doubles, ...
The values lhs and rhs and f(lhs, rhs) are constrained and will always be fully represented positive integer doubles in the sense they are strictly positive and that index is always less than 53 and the integer part of lhs or rhs always equals lhs, rhs respectively. One could size ANS so that there were no memory allocations during the loops.
Suggestions appreciated?
- Tags:
- CC++
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Optimization
- Parallel Computing
- Vectorization
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim P
Thanks for this. I have not found a link to any intel documentation that suggests the compiler will auto vectorise a loop with ldexp or frexp in it. Nor can I find any intrinsics that will do it; but I am not expert on what is available. boost::simd:: seems the main place for what I need but it would introduce a big dependency.
My guess is that it should be OK with the begin/end vectors so long as the construction of those vectors is local but I agree that it is something to remove, and then check if the process fails to vectorise when one has sorted the frexp etc.
I meant 128 bit doubles not extended doubles - my lack of clarity;.and yes, I would need simd pragma, and to template the function or embed it so it is inline and unrolls. Still the challenge seems to remain. .
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page