I managed moving to ippiConvFull_32f_C1, converting 16u to 32f.
Indeed, ippiConvFull is faster than ippiFilter32f, at larger kernels. It even seem to have a constant speed regardless the size of the filtering kernel (varying 7x7-31x31).
I do have problems getting the border and anchor right when using ippiConvFull.
Is there an example describing moving from ippiFilter to ippiConvFull?
ippiFilter does require borders and it also use an anchor, but ippiConvFull does not.
One good example for using the Convolution Functions is in the IPP manual.
I cannot say that that example is good, because it does not describe handling borders, and that was the problem I was asking about.
The functions ippiFilterXXX all properly describe borders, but the ippiConvXX does not; and you do need to consider borders when using ippiConvXXX, both technically (to prevent an AV), and also visually (because the resulting image should look good at the borders.
I'm very happy with the performance of the function, but I could use some help/example with handling borders for the above reasons.