An example of something I often have to do - I have a vector with (say) 1024 floating point values and I want the maximum of every 8 values, i.e.
output = max (input, 0 to 7)
output = max (input, 8 to 15)
output = max (input, 15 to 23)
So what's the correct / most efficient way to do this? I don't think there's a function...
There may be a better answer, but based on this description I think you're correct that this is not covered by a single function. It may be possible to work in a loop across this vector where you compute the maximum for every 8 values separately. You may want to check whether it is faster to use an IPP function on these very short vectors or write your own max function.
Depending on future use of this data you may be able to combine steps after computing the max inside this loop for greater cache efficiency. That is an advantage you will get with looping over shorter vectors that would not be available by calling a single IPP function to go over the entire array.
Just add one piece. If you have Intel composer product, you can try array notation, which can operate array in simple way.
for example, C[:]=A[:]+B[:].
for (i<0; i<length; i+=8)