FFT performance may depend on

Dylan_B_ · ‎06-08-2015

Hi,

I am working on a 3D numerical integrator for a non-linear PDE using the parallel FFT library included in MKL.

My arrays consist of 2^30 data points which is much much larger than the cache. This results in ~50% of cache references being misses leading to a massive amount of execution time being purely accessing memory.

Is there a clever way I can deal with this? Is it expected to have 50% cache misses using an array this large?

Any help would be much appreciated.

Thanks,

Dylan

Evgueni_P_Intel · ‎06-08-2015

Hi Dilan B.,

Cache-miss rate of 50% is OK for large out-of-place FFTs. Did you try in-place 3D transforms?

For most data points of large 3D transforms, the miss-hit pattern is MHHHMH for in-place transforms and MMHHMH for out-of-place transforms -- 33% and 50% cache-miss rate. Though real figures may be higher, switching to in-lpace transforms may improve performance.

Evgueni.

Dylan_B_ · ‎06-08-2015

Hi Evgueni,

Thanks for the prompt reply.

I tried using in-place transforms and it improved the cache miss rate by approximately 5% compared to out-of-place transforms. I am still finding my performance underwhelming compared to a solver using FFTW3 I had written in the past and I am completely stumped on how or if I can further increase performance

I have also noticed that certain runs can have a cache miss rate of as high as 65% with no changing of parameters in my source code.

Thanks for your help,

Dylan

Evgueni_P_Intel · ‎06-08-2015

FFT performance may depend on the layout of the dataset in the memory, threading runtime settings, etc.

To speedup investigation, please post a reproducer here or send it privately.

3D FFT in MKL with data larger than cache