Can you share a test case to investigate this?
As in the ThreadedFunctionsList.txt, ippmQRDecomp and ippmQRBackSubst are thereded. So I like to know how you are threading and what will be Environment variable value(OMP_NUM_THREADS)?
Did you verified any threading issues like deadlock or Concurrency using Intel Parallel Studio XE 2011?
The specific functions I'm calling are: ippmQRBackSubst_mv_32f, ippmQRBackSubst_mva_32f, ippmQRDecomp_m_32f. If I remember correctly, only one of them is on the ThreadedFunctionsList.txt.
I do not set the OMP_NUM_THREADS value. In my program, I see 4 threads working, so that is what OMP is using.
I did check deadlock and concurrency using Parallel Studio XE 2011. No deadlock issues, and concurrency is good, 58% of the time 4 cores are running and 34% 3 are running, 4% 2 are running, 3% 1 is running, 1% idle.
Thanks for info..
I am not aware of any issues related to these two function- ippmQRDecomp and ippmQRBackSubst.
Test case would be better to verify it.
I call the program with these arguments:
"c:/temp/from.bin" "c:/temp/to.bin" 4
where the first two are the data files I sent and the last one is the number of theads, I use either 1 or 4.
What I find is that when I have 1 thread I spend about 12 CPU seconds in the ipp routines, and when I have 4 threads (my laptop has four cores) I spend about 28 CPU seconds in the ipp routines. I would expect both runs to spend a similar amount of time in the ipp routines. The CPU times are as reported by Amplifier XE Hotspots.