Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Not seeing linear speedup

gauss2561
Beginner
400 Views

We have an application that uses the IPP FFT functions. There are several calls to a function using the FFT that we want to thread. We use OpenMP at the application level and link to the single-threaded version of the IPP libraries. We are using Visual Studio C++ 2008. A skeleton of the code looks like this:

#pragma omp parallel for for (int i = 0; i < numLoops; i++) { doSomething(); // this function calls FFT }

On a four-core HT i7 processor we see a speedup over a non-threaded version by a factor of about 2.5. When we do the same test using FFTW instead of IPP, we see a speedup of 4. I.e., a linear speedup proportional tothe number of real (non-hyperthreaded) cores.

My question is, if we see a linear speedup with FFTW, wouldn't we expect to see the same with IPP?

Bruce

0 Kudos
4 Replies
Naveen_G_Intel
Employee
400 Views
Hi Bruce,

Hi,

What is the value of OMP_NUM_THREADS, while running IPP FFT? What is the size of FFT?

Another reply on IPP forum with additional information on improving IPP FFT performance refer to here.

Thanks,

Naveen Gv

0 Kudos
gauss2561
Beginner
400 Views
Hi Naveen,

We don't set the environment variable OMP_NUM_THREADS. My understanding is that the number of threads will thus be automatically set to the number of available cores seen by the operating system, which would be 8 in our case.

The size of the FFTs is 128K.
Thanks for the link about improving IPP FFT performance. We are following all the recommendations there.
Bruce
0 Kudos
Chao_Y_Intel
Moderator
400 Views


Bruce,

I recalled some similar performance issue before: Before the FFT computation, it will use some memory allocation function (ippsFFTInitAlloc, or some others) to allocate memory for FFT computation. Memory allocation function is serial, your OpenMP code cannot be paralleled at this part. To resolve the problem, allocate the memory first before the for loop.

Thanks,
Chao

0 Kudos
gauss2561
Beginner
400 Views
Thanks very much, Chao. We are indeed allocating memory inside the loop. I'll move it out and see if it helps.
Bruce
0 Kudos
Reply