Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
6704 Discussions

Trying to achieve parallel execution in using IPP function ippiAlphaCompC

mmdev
Beginner
528 Views
Hello:

I have written an application that creates and runs multiple threads on multiple Corei7 Intel processors. My system has eight cores. My number of threads is four.

Each of the threads, which are not synchronized with each other, calls into the ippiAlphaCompC IPP function. The threads are all created and started from the same application main thread.

Through some analysis, it seems that the function ippiAlphaCompC blocks completely, i.e. when thread1 on core1 has invoked function ippiAlphaCompC, a call to function ippiAlphaCompC from thread2 running on core2 must wait to execute until the thread1 ippiAlphaCompC call has completely finished. What I'm trying to do is achieve parallelism, by processing multiple sets of data at once across multiple threads/cores using common IPP functions.

Does this make sense? Would one expect this function to block? Is there a way to achieve parallelism with this function across multiple threads/cores (without, e.g. using multiple loads of the IPP dlls within multiple processes).

I'm using dynamic linking and dispatching and I have set the number of threads in IPP to 1 (although leaving the default value to 8 on my system does not seems to make a difference).

Thanks.
0 Kudos
10 Replies
Vladimir_Dudnik
Employee
528 Views
IPP functions are not blocking. How did you come to the conclusion that you see blocking effect?

Regards,
Vladimir
0 Kudos
mmdev
Beginner
528 Views
Hi Vladimir,

I have a simple test application with some timing benchmarks.

If I call this function several times within one thread, serializing the function calls, the timing data I observe for each call is what I would expect, approx 3msec.

Thread1()
{
call functionx() // 3msec
call functionx() // 3msec
}

If I separate the calls into two separate threads, run on two separate cores, wouldn't one still expect the result to be the same? They should run in parallel. The timing returned is quite higher in this second case.

Thread1() // on core1
{
call functionx()
}

Thread2() //on core2
{
call functionx()
}

Thanks.


0 Kudos
Vladimir_Dudnik
Employee
528 Views
I would expect twice less time for parallel run. Although a total execution time is sum of time required to start/end thread and time for actual processing of data. If amount of data is small enough than threading overhead may overcome benefits from parallel execution.

Vladimir

0 Kudos
mmdev
Beginner
528 Views
I would expect twice less time for parallel run. Although a total execution time is sum of time required to start/end thread and time for actual processing of data. If amount of data is small enough than threading overhead may overcome benefits from parallel execution.

Vladimir

Vladimir,

My timing measurements are not including the threading overhead. My code is only measuring the time spent in the IPP function call(). The threading overhead is important, but for now I'm just focused on the work done in the IPP function (and time to do that work).

If the two calls are occurring on two separate threads on two separate cores (on two completely separate sets of image data), would you expect the timing result for each call be about the same as in the serialized, single thread case?

Thanks.

-Mike

0 Kudos
Vladimir_Dudnik
Employee
528 Views
If in your serial code you call IPP function twice which give you 3 + 3 = 6 msec of running time then running these functions in parallel I would expect total time about 3 msec (just 3 is ideal case not achivable on practice)

Vladimir
0 Kudos
mmdev
Beginner
528 Views
If in your serial code you call IPP function twice which give you 3 + 3 = 6 msec of running time then running these functions in parallel I would expect total time about 3 msec (just 3 is ideal case not achivable on practice)

Vladimir

Okay. That's what I'm expecting to see as well - each call in the serialized case is 3msec, so I would expect each single call in the multithreaded/multi-core case to be 3msec as well. Since I'm not observing this, and the numbers sometimes can bequite a bit higher (sometime close to twice), I've drawn the conclusion (perhaps incorrectly) that the IPP call is blocking. In this simple test case, with no other processing occurring, I'm having a difficult time thinking of another cause.

Thanks.

-Mike
0 Kudos
Vladimir_Dudnik
Employee
528 Views
Yeah, that is something unexpected. Could you please try to link with non threaded static libraries to see if there is any difference? What version of IPP do you use, perhaps the latest IPP 6.1 update 1?

Vladimir

0 Kudos
mmdev
Beginner
528 Views
Yeah, that is something unexpected. Could you please try to link with non threaded static libraries to see if there is any difference? What version of IPP do you use, perhaps the latest IPP 6.1 update 1?

Vladimir


Okay. I'll try that and report back my findings. I'm using version IPP 6.1, but perhaps not the latest (update 1?).

Thanks.

Mike
0 Kudos
Vladimir_Dudnik
Employee
528 Views
If you can upload your test case here we also would like to investigate what might be the problem with it.

Vladimir
0 Kudos
Mikael_Grev
Beginner
528 Views

Guys,

3ms is too short a time to test this since the recidual processing of the loading of the test app can saturate one core during those 3msec. Do a for loop around every call and try again.

Also, depending of what timer you use, timer resolution can be a problem as well. Never benchmark much below 1 sec.

Cheers,
Mikael


0 Kudos
Reply