Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

IPP FFT time increases are not linear

glewis
Beginner
635 Views
I have an application that I'm running on a Linix platform using the IPP 5.3 FFT functions. My application runs 3 different FFT sizes, 512pt, 4096pt, and 32768pt all real to Complex (PERM format) out of place. Now the 2 smaller sizes work great and run fast enough in the time that I need, however, the large FFT well it's dog slow....

Here's what I've measured:

4096pt: .00031812s
8192pt: .00101566s (~3x slower than the 4k)
16384pt: .05787439 (~56x slower than the 8k, what happened here?)
32768pt: .09263355 (~91x slower than the 8k, what!?)

This is running the same code (not multi threaded, and fixed to a single processor) just changing the FFT size. I was expecting some sort of linear progression in time required but there is something else going on when my size increases past 8k.

Any suggestion?

0 Kudos
5 Replies
Vladimir_Dudnik
Employee
635 Views

Hello,

Themore data your have the more frequent are cache misses, isn'it?

Regards,
Vladimir

0 Kudos
gol
Beginner
635 Views
in the case you're testing on a P4, you should read about "64k aliasing on P4" in the IPP performance tips & tricks. It looks like a serious "feature" and I hope it's only on that old P4. I'd think that it would apply more to FFT stuff, in the case you alloc contiguous temp buffers, they're likely to be separated by 64k multiples.
0 Kudos
glewis
Beginner
635 Views

Sorry I didn't specify in my original posting but I'm running on a Intel Xeon Quad Core processor (Intel Core 2 QuadProcessor (type 0x22 returned by IppGetCpuType ()). More specifically it's a Intel Xeon Processor type E5405 2.0GHz.

So I couldn't find the "64k aliasing on P4" note you suggested reading but it started me thinking that currently I'm running my RFFT (Real to Complex - Perm Format) out of place. So I tried coping my data to a temp buffer and running the RFFT in place with comperable results.

If you can guide me to where I'll find the note "64k aliasing on P4" I'd appreciate it.

0 Kudos
gol
Beginner
635 Views
it's in the PDF named "Intel Integrated Performance Primitives (IPP) - Performance Tips and Tricks", it's probably where other IPP PDFs are (I don't remember where I got it from but it can't be too hidden)

0 Kudos
glewis
Beginner
635 Views

Sorry I didn't specify in my original posting but I'm running on a Intel Xeon Quad Core processor (Intel Core 2 QuadProcessor (type 0x22 returned by IppGetCpuType ()). More specifically it's a Intel Xeon Processor type E5405 2.0GHz.

So I couldn't find the "64k aliasing on P4" note you suggested reading but it started me thinking that currently I'm running my RFFT (Real to Complex - Perm Format) out of place. So I tried coping my data to a temp buffer and running the RFFT in place with comperable results.

If you can guide me to where I'll find the note "64k aliasing on P4" I'd appreciate it.

Ok let me add more datato see if I can get some answers to a real problem I'm having.

I am using a genuine E5405 Intel chip. if I use the IPP call "IppGetCpuType" that works as expected. However, when I use the call "IppGetMaxChacheSizeB" it tells me that it cannot determine the cache size. I believe that this is causing my problem with FFT processing. When I use a larger FFT size (32k Real to Complex) it takes signifigantly longer thanit should and I believe that it's because the data is being swapped in and out of cache ("64k aliasing on P4").

So my big question is why can't the IPP library determine the cache size and is there some way to manually set the cache size (since I know what it is) for the library so the data is not swapped?

0 Kudos
Reply