I am using IPP 7.0 to write some code that will perform a series of 2D image filtering operations on images of various sizes (mostly 128 by 128). I have two versions of the code: one that leaves the threading up to IPP (internal to the DFT call, for example), and another in which I handle the threading myself at the application level.I have found that the IPP threading does notprovide any significant speedupfor the small sizedDFT operations (like 128 by 128) and so I really want to use application level threading. When Ido this, however,the final images are not correct. If I run the two threads serially, things work fine, but when I run them in parallel the result is wrong. I suspect that I need to provide more information than what I have provided, but I'm not sure what ype of information would be most useful. The IPP functions I am using are as follows:
ippiDFTFwd_RToPack_32f_C1IR
ippsRealToCplx_32f
ippiCplxExtendToPack_32fc32f_C1R
ippiMulPack_32f_C1IR
ippiDFTInv_PackToR_32f_C1IR
I tried simplifying the code somewhat by just doing the DFT, then zeroing out the DC component and then doing the InvDFT. I still got the same result (twothreads running in parallel give the wrong answer, while two threads running serially give the correct answer). In this simple case, the only IPP functions I use are as follows:
ippiDFTFwd_RToPack_32f_C1IR
ippiDFTInv_PackToR_32f_C1IR
Please let me know what other information I can provide that might help in diagnosing this.
Thanks for your help!