ippiSAD4x4_ is very simple and low level primitive. So threading within the function is not efficient. So a better choice is threading at the high level, not the function internally
As it is showed in this article:
It is threaded at high level for loop
#pragma omp parallel for
for (int i=0; i
Also, for the IPP video sample code, it is also threaded at the high level sample, not the the low level functions.
Threading of SADs at the primitive level doesn't make sense - SAD primitives work ~30-150 CPU clocks (depends on flavor and arch) while threading overhead (OMP) is ~2000-3000 clocks - that means that encoder can be efficiently threaded ONLY at the application level. This is true for all other low-complex, few-computational and "short" functions. A list of threaded IPP functions is available in the package - it contains ~2400 APIs (from ~12000) - so don't expect that all functions you use are threaded.
OK. I got it. Thank you.
I thought ipp primitive threading tech are automaticthreaded higher level in those loop and its internal parallel. Now I know it must be threaded at higher level loop manually. And the best and easier way to use OMP for the parallel. But I'm still confuse that there is no threaded videoapplication in the samples.
Regarding above comments,
Some of video codec samples are thered, you can refer to ipp-samples\audio-video-codecs\doc\UMC reference Manual, every section has explanation on Threading Capabilities. For example DV decoder can create number of threads.