I just realized that the fftwf_plan_guru_split_dft_r2c() functional call in Intel's C library simply returns a null. So instead of writing a wrapper to use the original library of FFTW, I thought it might be easier to directly use Intel's DFT functions. My problem is that I really want the complex output to be stored in two different arrays (e.g. rdata and idata for the real and imag, respectively). It is possible to configure a complex forward FT to use two split arrays for both the input and output data by setting DFTI_COMPLEX_STORAGE = DFTI_REAL_REAL, but I don't see how I can do the same thing for the complex output in a DFTI_REAL forward FT. I was hoping that DFTI_CONJUGATE_EVEN_STORAGE is useful to do what I want, but this does not seem to be the case. I am really puzzling why this cannot be done easily. Any suggestions on how I may do what fftwf_plan_guru_split_dft_r2c() in FFTW does are much appreciated!
- Development Tools
- Intel® C++ Compiler
- Intel® Parallel Studio XE
- Intel® System Studio
- Parallel Computing
Intel doesn't have its own runtime C or C++ library, it uses the one provided on the system.
Furthermore fftwf_plan_guru_split_dft_r2c is not part of the standard C library - it looks like it's in some OpenSource library.
I found it on Github here:
So I think you are asking this question in the wrong place.
Thanks for your response. Of course, I am aware of the FFTW library. However, there is an FFTW interface to Intel's Math Kernal Library, which allows me to use FFTW's function calls.
I posted the same question in your MKL forum and Fiona has answered me. It is very disappointing that your MKL simply returns a NULL for the ftwf_plan_guru_split_dft_r2c() function call, and the library currently does not have the feature of separately storing the complex output from IDFT_REAL FT in two split arrays. I have made a feature request for your development team to consider this in the future versions.
Thanks anyway for your answer.
Load 256-bit complex (8 floats/4 complex)
_mm256_permutevar8x32_ps to rearrange into 128-bit 4 reals plus 128-bit 4 complex
_mm_256_storeu2_m128 to store the two 128-bit values into two buffers.