Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
6825 Discussions

massive slowdown(x100) when using one of two cores ippiCrossCorrValid_NormLevel

Felix_K_
Beginner
1,442 Views
Hello ,

we are using the function ippiCrossCorrValid_NormLevel_8u32f_C1R() of IPP V5.1 on Pentium M and P4 single core machines with success. With the current setup it takes about 4ms.

As expected, on our dual core systems ( "core duo" and "core 2 duo") it runs at >twice the speed (~1ms). However, when using just one of two cores (via win API SetProcessAffinityMask) the function call takes more than 300ms to finish! We expect the speed to be approx. that on Pentium M.
We have already tried V5.2 but there is no change in that behaviour.

We need to run our application on one of the two cores to get independent realtime behavior of two instances of our application.

In comparison, the function ippiResize_8u_C1R seems to work fine using just one of the two cores.






0 Kudos
9 Replies
Vladimir_Dudnik
Employee
1,442 Views

Hello,

thanks for reporting about that issue, I've submitted issue report for you on Intel Premier Support, you will be notified about progress on this issue

By the way, to disable threading (or to limit number of internal threads in IPP) you can use ippSetNumThreads function instead of Win API. Remember, Win API is not aware about OpenMP threading used in IPP.

Regards,
Vladimir

0 Kudos
Felix_K_
Beginner
1,442 Views
Hello Vladimir,

thanks for your fast reply!

As mentioned, our application requires, that each instance of it has to run on its own core to avoid crosstalk. Each application for itself is multithreaded and even may have several threads invoking ipp calls. E.g:

instance 1 has 4 threads:
a) mainthread
b) template matching thread (using ippiCorr, processing image A)
c) template matching thread (using ippiCorr, processing image B)
d) 1D signal processing thread (using ippsWhatever)

instance 2 has 2 threads:
a) mainthread
b) computer vision thread (using ippiWhatever)

We want all the stuff of instance 1 to be computed using core 1, and instance 2 with its threads on core 2. Naturally the threads within an instance are crosstalking, because they share one core. But the computations of instance 2 are not allowed to have any impact on instance 1.

We are expecting the SetProcessAffinity call to bound instance1 and all of its threads to core 1. I am not quite sure that calling ippSetNumThreads(1) from each instance helps us in this concern.

If I am right, statical binding would solve this issue, but we would like to run our app on different architectures (Pentium M, core duo, core 2 duo and later maybe quad core)

I hope I could make our problem clearer to you.
Greets,

Felix

Note: The timing behaviour described previously applies running just one single instance (using one core).

0 Kudos
Vladimir_Dudnik
Employee
1,442 Views

Hi Felix,

we would like to understand that issue deeper. Could you please provide as a simple (or might be not so simple) test case to reproduce that? If it is not convenient for you to share that code here you can use Intel Premier Support channel, it means you need to submit your issue and as a communication starts to provide test case. Is it possible?

Regards,
Vladimir

0 Kudos
levicki
Valued Contributor I
1,442 Views
I fail to understand why do you make the effort to thread the code and then execute it on a single core?
0 Kudos
Felix_K_
Beginner
1,442 Views
Hello Igor,

we are developing real time applications for industrial envroinments (as far as this is possible with winxp). That means with have strict timing requirements. In the past we were happy with single core cpus. One cpu --> One rt-application.

With the new generation of multicore cpus, it is possible to sell one machine with a n-core-cpu running n rt-apps independently ( that means without crosstalk between the different apps). As you can see, we dont use the cores to speed up our (single) app but to run multiple apps without sharing cores.


We also would like to use multicore cpus like that:
n-core-cpu running n/2 rt-apps (e.g. QuadCore running 2 rt-Apps)

If possible, this should work using the win32 SetProcessAffinity api call.
Regards,
Felix
0 Kudos
Felix_K_
Beginner
1,442 Views
Hello Vladimir,

ippSetNumThreads(1) apperently solved our problem, but we are still validating that. At the moment I can't provide you a test app... I will keep you informed!

Regards,
Felix
0 Kudos
Ying_S_Intel
Employee
1,442 Views

Few more comments:

You may check the Intel IPP FAQs on threading topic at:
http://support.intel.com/support/performancetools/libraries/ipp/sb/CS-010662.htm

In addition to this, this web site provides downloads for Intel IPP threaded API list since version 5.0:
http://support.intel.com/support/performancetools/libraries/ipp/sb/CS-026584.htm

By the way, can you also confirm if Intel IPP version 5.2 still has this issue or not?

Thanks,

Ying

0 Kudos
Felix_K_
Beginner
1,442 Views
Hello!

we already tried out 5.2beta with the same results (as described in first post).
Regards,

Felix
0 Kudos
Vladimir_Dudnik
Employee
1,442 Views

Hi Felix,

for the now you can set number of threads to be launched from IPP DLLs to 1. Other option is to link your application with IPP static libraries. We are working to fix that issue in the next version of IPP.

Regards,
Vladimir

0 Kudos
Reply