Here's what I've measured:
8192pt: .00101566s (~3x slower than the 4k)
16384pt: .05787439 (~56x slower than the 8k, what happened here?)
32768pt: .09263355 (~91x slower than the 8k, what!?)
This is running the same code (not multi threaded, and fixed to a single processor) just changing the FFT size. I was expecting some sort of linear progression in time required but there is something else going on when my size increases past 8k.
Sorry I didn't specify in my original posting but I'm running on a Intel Xeon Quad Core processor (Intel Core 2 QuadProcessor (type 0x22 returned by IppGetCpuType ()). More specifically it's a Intel Xeon Processor type E5405 2.0GHz.
So I couldn't find the "64k aliasing on P4" note you suggested reading but it started me thinking that currently I'm running my RFFT (Real to Complex - Perm Format) out of place. So I tried coping my data to a temp buffer and running the RFFT in place with comperable results.
If you can guide me to where I'll find the note "64k aliasing on P4" I'd appreciate it.
Ok let me add more datato see if I can get some answers to a real problem I'm having.
I am using a genuine E5405 Intel chip. if I use the IPP call "IppGetCpuType" that works as expected. However, when I use the call "IppGetMaxChacheSizeB" it tells me that it cannot determine the cache size. I believe that this is causing my problem with FFT processing. When I use a larger FFT size (32k Real to Complex) it takes signifigantly longer thanit should and I believe that it's because the data is being swapped in and out of cache ("64k aliasing on P4").
So my big question is why can't the IPP library determine the cache size and is there some way to manually set the cache size (since I know what it is) for the library so the data is not swapped?