topic 3D FFT in MKL with data larger than cache in Intel® oneAPI Math Kernel Library

3D FFT in MKL with data larger than cache

Dylan_B_ — Mon, 08 Jun 2015 21:35:45 GMT

Hi,

I am working on a 3D numerical integrator for a non-linear PDE using the parallel FFT library included in MKL.

My arrays consist of 2^30 data points which is much much larger than the cache. This results in ~50% of cache references being misses leading to a massive amount of execution time being purely accessing memory.

Is there a clever way I can deal with this? Is it expected to have 50% cache misses using an array this large?

Any help would be much appreciated.

Thanks,

Dylan

Hi Dilan B.,

Evgueni_P_Intel — Tue, 09 Jun 2015 04:28:06 GMT

Hi Dilan B.,

Cache-miss rate of 50% is OK for large out-of-place FFTs. Did you try in-place 3D transforms?

For most data points of large 3D transforms, the miss-hit pattern is MHHHMH for in-place transforms and MMHHMH for out-of-place transforms -- 33% and 50% cache-miss rate. Though real figures may be higher, switching to in-lpace transforms may improve performance.

Evgueni.

Hi Evgueni,

Dylan_B_ — Tue, 09 Jun 2015 05:02:03 GMT

Hi Evgueni,

Thanks for the prompt reply.

I tried using in-place transforms and it improved the cache miss rate by approximately 5% compared to out-of-place transforms. I am still finding my performance underwhelming compared to a solver using FFTW3 I had written in the past and I am completely stumped on how or if I can further increase performance

I have also noticed that certain runs can have a cache miss rate of as high as 65% with no changing of parameters in my source code.

Thanks for your help,

Dylan

FFT performance may depend on

Evgueni_P_Intel — Tue, 09 Jun 2015 05:12:49 GMT

FFT performance may depend on the layout of the dataset in the memory, threading runtime settings, etc.

To speedup investigation, please post a reproducer here or send it privately.