IPP Threading: Possible to use different num threads for calls in same process?

gbraytx · ‎11-13-2009

Is it possible to change the number of threads that IPP uses dynamically (on a call by call basis)?

Background:
In the same process, we are making lots of calls to IPP arithmetic calls (add, subtract), as well as a call to ippiConvValid_32f_C1R.

Each iteration, the add/subtract calls are called a LOT and used on relatively small data sizes (say 8,000 or less values), and the convolution routine is called once on a very large data size (say 8,000,000 values).

These operations occur separately and there should be no thread contention in between them.

With ippSetNumThreads set to 1 the adds and subtracts perform very fast, but the convolution is slow.
With ippSetNumThreads set to the core amount the adds and subtracts perform very slow but the convolution is fast.

Our guess at this point is that when there are existing threads, IPP is threading out these small adds and subtracts, which is not effective for their sizes. Additionally, we would prefer to do the threading logic ourselves for these calls. However, we do not want to do the threading logic for the convolution call.

Vladimir_Dudnik · ‎11-13-2009

Hello,

IPP do not prevent you to call ippSetNumThreads several times with diffferent values so it should be possible to change number of theads on by call basis.

Regards,
Vladimir

Gennady_F_Intel · ‎11-15-2009

Quoting - gbraytx

Is it possible to change the number of threads that IPP uses dynamically (on a call by call basis)?

Background:
In the same process, we are making lots of calls to IPP arithmetic calls (add, subtract), as well as a call to ippiConvValid_32f_C1R.

Each iteration, the add/subtract calls are called a LOT and used on relatively small data sizes (say 8,000 or less values), and the convolution routine is called once on a very large data size (say 8,000,000 values).

These operations occur separately and there should be no thread contention in between them.

With ippSetNumThreads set to 1 the adds and subtracts perform very fast, but the convolution is slow.
With ippSetNumThreads set to the core amount the adds and subtracts perform very slow but the convolution is fast.

Our guess at this point is that when there are existing threads, IPP is threading out these small adds and subtracts, which is not effective for their sizes. Additionally, we would prefer to do the threading logic ourselves for these calls. However, we do not want to do the threading logic for the convolution call.

// to exploit the maximum of CPU power, please try to do smth like following
ippSetNumThreads( 1 );
ippiAdd_8u_C1RSfs(..)

int numCore = ippGetNumCoresOnDie();
ippSetNumThreads(numCore);
ippiConvValid_32f_C1R()

//and back again
ippSetNumThreads( 1 );
ippiAdd_8u_C1RSfs(..)