Any idea why the function in a multithreaded program is working at about 1/4 speed than when it works in a single thread?
I am benchmarking the net performance of the functions (it doesn't include any waits, locks, anything)?
can we see your test code ?
If multiple threads work on resource that should be accessed by one thread at a time, there could be performance loss for the reason of concurrency control.
could you provide a reproducer? Each IPP release is distributed with IPP PS (perf system) - therefore you can try 2 executable (with -B -r -fFIRMR switches) and compare results from corresponding CSV files:
ps_ipps_mrg_compl_mt.exe (mt==multithreaded) or
ps_ipps_mrg_compl_st.exe (st == singlethreaded)
or run ps_ipps_pcs.exe with multi-threaded dlls/shared objects and single-threaded.
I see good scalability for numIters >= 2K and tapsLen >= 32 (FIRMR has internal criterions on input parameters and mt-code works only for rather large filter orders and number of iterations (also depends on architecture)).