Currently the available options for floats/doubles are:
- ippsNorm_L2_64f which than has to be squared - loss of precision and less performant than just doing L2Sqr from the start
- ippsDotProd_64f with the same vector given as pSrc1 and pSrc2 - that works but I guess a more efficient implementation might be possible given the knowledge that the same vector is being inner-producted with itself.
It would be great to instead have a precise and as-efficient-as-possible ippsNorm_L2Sqr_64f.
Intel IPP provides standard DSP operation, what you requested are for none standard operation. Could you provide the performance and accuracy impact of your suggested operation? Thanks!