Using Intel IPP with TBB

caosun · ‎07-24-2012

I am trying to use TBB and IPP together to gain speed performance.
I use TBB todo filtering with IPPfunction "ippsFIR_32fc", each thead works on portion of data. But the results are quite strange. I can see a lot of glitch (very large values)into the output data.

The code is as following:

parallel_for(tbb::blocked_range (0, inV.Length, inV.Length/1.5), tbb_parallel_fir_task((Ipp32fc *)inV.Data, filterCoefCP, filterVP->Length, (Ipp32fc *)outVP->Data, m_stateP));

void operator() (const blocked_range& r) const
{

Int begin = r.begin();

Int end = r.end();

Int nIters = end - begin;

ippsFIR_32fc(m_inP + begin, m_outP + begin, nIters, m_stateP);

}

If I remove the IPP function "ippsFIR_32fc" with "ippsCopy_32f", the multiple thread copy functionality works fine.

Another question is: For float point function, I did not see this type of FIR: complex input data and real filter coefficients. I indeed see complex input data and complex filter coefficients OR real input data and real filter coefficients.

Note: I have already use function 'ippSetNumThreads(1)' to set IPP internal OpenMP threads number to 1.
Could you please help me?

pvonkaenel · ‎07-24-2012

I haven't used the ippsFIR routines before, but I think you may need to consider boundary conditions. Can you really break the sample space up evenly, or do you need to include some overlap due to the filtering?

Also, if you're using IPP with TBB, I'd recommend linking with the unthreaded static libs instead of the DLLs. I don't think there's a way to completely disable OpenMP when linking with the DLLs, and this is why you need to call ippSetNumThreads(1).

Peter

caosun · ‎07-24-2012

I did the experiment:

1. I set the grid size so that only two TBB thread is used, each TBB thread calculate half of the data.

2. If only the first half data (first TBB thread) is filtered (the second TBB thread do nothing), the result is fine (The first half output is correct, the second half output is not calculated).

3. If only the second half data is filtered, the second half output is correct.

4. If the two TBB thread work together, the results are completely wrong with a lot of glitch, some data might be 1e34 (The correct data should be less than 1).

Bob_Davies · ‎07-25-2012

I don't see anything wrong with the ipp call. It looks correct and you are using a version of the API that can be multi-threaded (any of the ippsFIR API's withSrcDst parameters cannot be multi-threaded). But what is thattilesize in the parallel for? inv.Length/1.5?Try takingthe default tile size and report back with the results.

SergeyKostrov · ‎07-25-2012

Quoting caosun

I am trying to use TBB and IPP together to gain speed performance.
...
Could you please help me?

Hi, I could look at the problem and here aretwo questions:

Could you post a small test-case?
What is your TBB version?

Best regards,
Sergey

Chao_Y_Intel · ‎07-25-2012

Hello,

You may find the "state structures" in IPP here: "state structures that are modified during operation":
http://software.intel.com/sites/products/documentation/hpc/ipp/ippi/ippi_ch2/ch2_function_context_structures.html

so, each threading should has its own status structures. From the code you post here, it looks the "m_stateP" is shared by multiple tasks, which may create incorrect result.

Thanks,
chao

SergeyKostrov · ‎07-25-2012

Quoting Chao Y (Intel)

...so, each threading should has its own status structures. From the code you post here, it looks the "m_stateP" is shared by multiple tasks, which may create incorrect result.

Hi Chao, It means that an array of filter states has to be used instead, like:

...
IppsFIRState_32f *pState[ > ];
...

Since forevery TBBthreadyou can geta ThreadID, or asimilar uniqueID,it is possible to map the pState variables to
a proper processing thread.

Best regards,
Sergey

caosun · ‎07-27-2012

Hi chao:

Thank you for your answer.You are right!

Another question is: For float point function, I did not see this type of FIR: complex input data and real filter coefficients. This is very common in many applications.

Best Regards,

Sun Cao

Chao_Y_Intel · ‎08-02-2012

thanks for the letting us know. For the FIR function, I think you may just use the complex function as a workaround.

Thanks,
Chao