Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.
Ankündigungen
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Not seeing linear speedup

gauss2561
Einsteiger
976Aufrufe

We have an application that uses the IPP FFT functions. There are several calls to a function using the FFT that we want to thread. We use OpenMP at the application level and link to the single-threaded version of the IPP libraries. We are using Visual Studio C++ 2008. A skeleton of the code looks like this:

#pragma omp parallel for for (int i = 0; i < numLoops; i++) { doSomething(); // this function calls FFT }

On a four-core HT i7 processor we see a speedup over a non-threaded version by a factor of about 2.5. When we do the same test using FFTW instead of IPP, we see a speedup of 4. I.e., a linear speedup proportional tothe number of real (non-hyperthreaded) cores.

My question is, if we see a linear speedup with FFTW, wouldn't we expect to see the same with IPP?

Bruce

0 Kudos
4 Antworten
Naveen_G_Intel
Mitarbeiter
976Aufrufe
Hi Bruce,

Hi,

What is the value of OMP_NUM_THREADS, while running IPP FFT? What is the size of FFT?

Another reply on IPP forum with additional information on improving IPP FFT performance refer to here.

Thanks,

Naveen Gv

gauss2561
Einsteiger
976Aufrufe
Hi Naveen,

We don't set the environment variable OMP_NUM_THREADS. My understanding is that the number of threads will thus be automatically set to the number of available cores seen by the operating system, which would be 8 in our case.

The size of the FFTs is 128K.
Thanks for the link about improving IPP FFT performance. We are following all the recommendations there.
Bruce
Chao_Y_Intel
Moderator
976Aufrufe


Bruce,

I recalled some similar performance issue before: Before the FFT computation, it will use some memory allocation function (ippsFFTInitAlloc, or some others) to allocate memory for FFT computation. Memory allocation function is serial, your OpenMP code cannot be paralleled at this part. To resolve the problem, allocate the memory first before the for loop.

Thanks,
Chao

gauss2561
Einsteiger
976Aufrufe
Thanks very much, Chao. We are indeed allocating memory inside the loop. I'll move it out and see if it helps.
Bruce
Antworten