Dear Sir/Madam,
As I understood from an example on https://software.intel.com/en-us/mkl-developer-reference-c-examples-of-using-openmp-threading-for-fft-computation, you calculate with 4 threads 4 different 2D-FFT (50x100 points). My question is - is it possible to calculate NxN FFT where I divide data between 2 processors where each will get NxM FFT and NXK FFT where M+K = N? I worry about the result, is it will be correct by following your example? Also, how do you do it inside and how I can apply it to fftw?
Thanks,
Sem
链接已复制
Hi Mathieu,
Thanks for your reply, but it is not exactly what I needed. Actually, I want to calculate, for instance, one part of 2D FFT on one socket and another part 2D FFT on another socket, after this to combine them in one that is will be correct. Like - FFT
Thanks,
Sem
Hi Sem,
Yes, the example on https://software.intel.com/en-us/mkl-developer-reference-c-examples-of-u..., you calculate with 4 threads 4 different 2D-FFT (50x100 points).
is it possible to calculate NxN FFT where I divide data between 2 processors where each will get NxM FFT and NXK FFT where M+K = N?
From MKL FFT point of views, we take the problem as below you expected to do NxN FFT on 2 processors. Actually, both MKL FFT and FFTW support internal multi-threads, which means , MKL FFT or FFTW will use the existing processors (2 here) and distribute the compute tasks of NXN FFT to mult-threads to compute and then return the correct result. then developer don't need take care of how to seperate the task.
What is your N size? Please see mkl use guide to check if the 2D FFT are multi-threaded in https://software.intel.com/en-us/mkl/documentation
Best Regards,
Ying
Hi Ying,
Thanks. My N can be 1024 and so on till 48540. I do not get how you combine the result in the end. I have checked and my result is not exactly correct when I calculated following by your method. I devide N*N/2, so one socket with 18 threads has N*N/2 data and another has also N*N/2. When I checked the result, it was incorrect. So, if you are saying that it is correct way to do one FFT by deviding it on different sockets, then I think I did some mistake, I need to check it again.
Thanks,
Sem
Hi Sem,
Sorry for unclear reply. I mean, you can call N (1024, 48540) FFT directly. You don't have to divide the data yourself. MKL will distribute the compute tasks in two socket automatically.
Best Regards,
Ying
Hi Sem,
Get it. then you may want to implement one mult-thread FFT by parallel method actually. then you may refer to the source code of FFTW multhreads or on-line source like https://www.codeproject.com/Questions/706613/How-To-Write-an-FFT-Algorithm-With-Thread.
Best Regards,
Ying
I am not sure to understand well, but you can still write you 2D FFT with two set of 1D FFTs, basically 2D FFT is a FFT along one axis and after the second one. (If you do R2C the first one is R2C and after you have C2C along teh second axis ).
And if you like you can still write, you set of 1D FFT as 2 subset (n/2 first one lines and n/2 last one lines) and the same after with columns. I use this decomposition when I know that a line is only with zeros I don't compute it, that gives me the opportunity to win some small % of computation.
If you do it over two sockets, the (manuel) memory management can be problematic (at least complex), if you look at the algorithm that is a divide and conquer, you will see that at the last step you need memory of both sides.
Hi Sem,
MKL provide Cluster FFT, which support MPI. It allows you to manage data between MPI threads and sockets by yourself. You can refer to MKL user guide and https://software.intel.com/en-us/mkl-linux-developer-guide-linking-with-intel-mkl-cluster-software ; and MKL provide the sample code under <install Dir /examples>
Agree with Mathieu, there are some ways to parallel the FFT. But the manual memory management can be problematic if your purpose here is get higher performance N FFT result, but not how to write a FFT multithread code.
Best Regards,
Ying
Hi Mathieu,
Thanks, yes, actually I am doing this way now. I thought there is a way to do it without transpose but just to sum results somehow from two parts of 2D FFT.
Hi Ying,
Thanks too, I have checked but did not find any example of implementation of mpi. Ok, I can do MPI + Multithreading in fftw, but would be good if the same I can do in mklfft.
Thanks,
Sem
Transpose ? What are you transposing ? Can you not just use the stride option ?
have maybe a look one : http://www.fftw.org/fftw3_doc/Advanced-Complex-DFTs.html
Basically, it looks like:
1) first compute the lines, that I suppose is ok
2) compute column-wise
fftw_plan_many_dft(1, numberOfLines, numberOfColumn,
in, numberOfLines,
numberOfColumn, 1,
out, numberOfLines,
numberOfColumn, 1,
FFTW_FORWARD, FFTW_PLAN_OPTION);
if you do C2C it's ok, if you do R2C you need to transform numberOfColumn in numberOfColumn/2+1, or something like that. I let you find out check that.
3) nothing else.
PS: I hope what I write is right ;)
PPS: in = out is fine too !