Intel® Integrated Performance Primitives
Community support and discussions relating to developing high-performance vision, signal, security, and storage applications.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!
6594 Discussions

IPP Threading: Possible to use different num threads for calls in same process?

gbraytx
Beginner
89 Views
Is it possible to change the number of threads that IPP uses dynamically (on a call by call basis)?

Background:
In the same process, we are making lots of calls to IPP arithmetic calls (add, subtract), as well as a call to ippiConvValid_32f_C1R.

Each iteration, the add/subtract calls are called a LOT and used on relatively small data sizes (say 8,000 or less values), and the convolution routine is called once on a very large data size (say 8,000,000 values).

These operations occur separately and there should be no thread contention in between them.

With ippSetNumThreads set to 1 the adds and subtracts perform very fast, but the convolution is slow.
With ippSetNumThreads set to the core amount the adds and subtracts perform very slow but the convolution is fast.

Our guess at this point is that when there are existing threads, IPP is threading out these small adds and subtracts, which is not effective for their sizes. Additionally, we would prefer to do the threading logic ourselves for these calls. However, we do not want to do the threading logic for the convolution call.

0 Kudos
2 Replies
Vladimir_Dudnik
Employee
89 Views
Hello,

IPP do not prevent you to call ippSetNumThreads several times with diffferent values so it should be possible to change number of theads on by call basis.

Regards,
Vladimir
Gennady_F_Intel
Moderator
89 Views
Quoting - gbraytx
Is it possible to change the number of threads that IPP uses dynamically (on a call by call basis)?

Background:
In the same process, we are making lots of calls to IPP arithmetic calls (add, subtract), as well as a call to ippiConvValid_32f_C1R.

Each iteration, the add/subtract calls are called a LOT and used on relatively small data sizes (say 8,000 or less values), and the convolution routine is called once on a very large data size (say 8,000,000 values).

These operations occur separately and there should be no thread contention in between them.

With ippSetNumThreads set to 1 the adds and subtracts perform very fast, but the convolution is slow.
With ippSetNumThreads set to the core amount the adds and subtracts perform very slow but the convolution is fast.

Our guess at this point is that when there are existing threads, IPP is threading out these small adds and subtracts, which is not effective for their sizes. Additionally, we would prefer to do the threading logic ourselves for these calls. However, we do not want to do the threading logic for the convolution call.


// to exploit the maximum of CPU power, please try to do smth like following
ippSetNumThreads( 1 );
ippiAdd_8u_C1RSfs(..)

int numCore = ippGetNumCoresOnDie();
ippSetNumThreads(numCore);
ippiConvValid_32f_C1R()

//and back again
ippSetNumThreads( 1 );
ippiAdd_8u_C1RSfs(..)
Reply