Hi,
If I want to square an Ipp32f image I find that using ippiMul_32f_C1R is many times (~7x) faster than ippiSqr_32f_C1R.
I am evaluating a trial version ippIP AVX (e9) version: 7.1.0 (r36264).
I use:
Intel(R) Xeon(R) CPU E31235 @ 3.20GHz
KMP_AFFINITY=verbose,granularity=core,compact,0,0
1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5)
I observe that ippiSqr uses more cores even with the affinity configuration above.
Thanks,
Pablo
Link Copied
For more complete information about compiler optimizations, see our Optimization Notice.