Intel® Moderncode for Parallel Architectures
Support for developing parallel programming applications on Intel® Architecture.

Questions on using TBB and IPP together

caosun
New Contributor I
701 Views
I am trying to use TBB and IPP together to gain speed performance.
I use TBB todo filtering with IPPfunction "ippsFIR_32fc", each thead works on portion of data. But the results are quite strange. I can see a lot of glitch (very large values)into the output data.


The code is as following:

parallel_for(tbb::blocked_range (0, inV.Length, inV.Length/1.5), tbb_parallel_fir_task((Ipp32fc *)inV.Data, filterCoefCP, filterVP->Length, (Ipp32fc *)outVP->Data, m_stateP));

void operator() (const blocked_range& r) const
{

Int begin = r.begin();

Int end = r.end();

Int nIters = end - begin;

ippsFIR_32fc(m_inP + begin, m_outP + begin, nIters, m_stateP);

}

If I remove the IPP function "ippsFIR_32fc" with "ippsCopy_32f", the multiple thread copy functionality works fine.
Could you please help me?

0 Kudos
10 Replies
TimP
Honored Contributor III
701 Views
IPP still uses OpenMP for parallelism, invoking strange behaviors when inside parallel_for, as there is no coordination between the respective run-time libraries. If you can link a non-threaded IPP library, or specify that IPP should use just 1 thread, you may have better luck. If that works, you could experiment with cases where the product of the number of TBB threads and number of OpenMP threads fits your platform, while ensuring that the various TBB threads don't cause IPP to write more than once in each output data region. If you want expert answers, you might try the TBB forum.
0 Kudos
caosun
New Contributor I
701 Views
Thank you for your reply, TimP.
I have already specify IPP use only 1 thread. And from Intel's docuement, it should be thread safe to use Intel TBB with IPP. I can understand if there is any performance degradation, but it should not be wrong results.
Could you please explain more on your comments: 'various TBB threads don't cause IPP to write more than once in each output data region'? How could I control to write only once for various TBB threads?
Another question is: For float point function, I did not see this type of FIR: complex input data and real filter coefficients. I indeed see complex input data and complex filter coefficients OR real input data and real filter coefficients.
0 Kudos
SergeyKostrov
Valued Contributor II
701 Views
Quoting TimP (Intel)
IPP still uses OpenMP for parallelism, invoking strange behaviors when inside parallel_for, as there is no coordination between the respective run-time libraries. If you can link a non-threaded IPP library, or specify that IPP should use just 1 thread, you may have better luck...

There is a post in one of recent threadson IPP forum that IPP team is cosidering / planing a release of non-threaded
version of IPP library. All threading will be a responsibility of users / developersof IPP library.
0 Kudos
TimP
Honored Contributor III
701 Views
If you're asking questions about how to choose between MKL and IPP, one of those forums would get you more expert advice. MKL has a wide variety of float and double functions, while IPP supports additional data types.
It may be entirely reasonable to use MKL sequential (non-threaded) functions in a tbb parallel region. The usual reason would be that you get better parallelism by running simultaneous independent analyses than by threading one analysis at a time. I don't know about the IPP equivalent.
You can't get consistent results with threading if multiple threads are trying to write the same array sections. That's called a race condition, where you don't know which thread will have the final say, and the thread which gets there last will be delayed waiting for earlier threads to complete their access.
0 Kudos
caosun
New Contributor I
701 Views
Thank you, TimP. I will post them to Both TBB and IPP forum for more information.
0 Kudos
SergeyKostrov
Valued Contributor II
701 Views
Quoting TimP (Intel)
IPP still uses OpenMP for parallelism, invoking strange behaviors when inside parallel_for, as there is no coordination between the respective run-time libraries. If you can link a non-threaded IPP library, or specify that IPP should use just 1 thread, you may have better luck...

There is a post in one of recent threadson IPP forum that IPP team is cosidering / planing a release of non-threaded
version of IPP library
. All threading will be a responsibility of users / developersof IPP library.


Here is a link to a post:

http://software.intel.com/en-us/forums/showpost.php?p=189800

...Another reason is that as there are more and more multi-threading methods used by our users, we
plan to remove the internal threading of IPP functionsso that user canusesuitable multi-threadsaccording
to their requirements...

0 Kudos
sureshgupta22
Beginner
701 Views
You should use serial IPP functions with TBB.. I don't know details however. A side note: what's a strange grainsize "inV.Length/1.5"! Do you want to parallel only on two threads?
0 Kudos
SergeyKostrov
Valued Contributor II
701 Views
Some additional technical details are provided in a thread with the same 'Forum topic' on TBB forum. Please take a look.
0 Kudos
caosun
New Contributor I
701 Views
Sergey Kostrov wrote:

Some additional technical details are provided in a thread with the same 'Forum topic' on TBB forum. Please take a look.

Thank you, Sergey. Could you please provide a link?
0 Kudos
SergeyKostrov
Valued Contributor II
701 Views
>>...Could you please provide a link? Please take a look at: software.intel.com/en-us/forums/topic/329664 software.intel.com/en-us/forums/topic/277266
0 Kudos
Reply