Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
6709 Discussions

ippmQRDecomp and ippmQRBackSubst take more CPU cycles nulti threaded on 4 cores than when single threaded

Chris_Vogt
Beginner
373 Views
I'm running Windows 7 64-bit on a quad core 2 extreme laptop. I'm using ippmQRDecomp and ippmQRBackSubst. When I run single threaded, my test case uses 44 seconds of CPU with 15 seconds in those two routines. When I run OpenMP threaded code, four threads, my test case uses 91 seconds of CPU with 52 seconds in those two routines. Why do those routines require more CPU cycles when running on four threads? Regardless of my configuration, I'd expect the time spent in those routines to be about the same. The timings are from VTune Amplifier 2011 XE and I'm using the Intel compiler.
0 Kudos
5 Replies
Naveen_G_Intel
Employee
373 Views

Hi,

Can you share a test case to investigate this?

As in the ThreadedFunctionsList.txt, ippmQRDecomp and ippmQRBackSubst are thereded. So I like to know how you are threading and what will be Environment variable value(OMP_NUM_THREADS)?

Did you verified any threading issues like deadlock or Concurrency using Intel Parallel Studio XE 2011?

Regards,

Naveen Gv

0 Kudos
Chris_Vogt
Beginner
373 Views
I will (try) to put together a test case. As I'm sure you can appreciate, that is not always an easy task. I was hopeing there was some known issue.

The specific functions I'm calling are: ippmQRBackSubst_mv_32f, ippmQRBackSubst_mva_32f, ippmQRDecomp_m_32f. If I remember correctly, only one of them is on the ThreadedFunctionsList.txt.

I do not set the OMP_NUM_THREADS value. In my program, I see 4 threads working, so that is what OMP is using.

I did check deadlock and concurrency using Parallel Studio XE 2011. No deadlock issues, and concurrency is good, 58% of the time 4 cores are running and 34% 3 are running, 4% 2 are running, 3% 1 is running, 1% idle.

Thanks.
0 Kudos
Naveen_G_Intel
Employee
373 Views

Hi Chris,

Thanks for info..

I am not aware of any issues related to these two function- ippmQRDecomp and ippmQRBackSubst.

Test case would be better to verify it.

Regards,

Naveen Gv

0 Kudos
Chris_Vogt
Beginner
373 Views
I'm not sure I got the files uploaded properly. There should be two zip files, one with data, one with the code. Let me know if either one didn't get through.

I call the program with these arguments:
"c:/temp/from.bin" "c:/temp/to.bin" 4
where the first two are the data files I sent and the last one is the number of theads, I use either 1 or 4.

What I find is that when I have 1 thread I spend about 12 CPU seconds in the ipp routines, and when I have 4 threads (my laptop has four cores) I spend about 28 CPU seconds in the ipp routines. I would expect both runs to spend a similar amount of time in the ipp routines. The CPU times are as reported by Amplifier XE Hotspots.
0 Kudos
Chris_Vogt
Beginner
373 Views
OK, it looks like the code did not get uploaded, so I'm trying again.
0 Kudos
Reply