I try to parallelize the fft computation in each row of the matrix fu(m,2), but the result is not correct. The following is my code:
type(DFTI_DESCRIPTOR), SAVE,POINTER :: My_FFT_Handle
Status = DftiCreateDescriptor(My_FFT_Handle, DFTI_DOUBLE, DFTI_REAL, 1, m)
Status = DftiSetValue (My_FFT_Handle, DFTI_NUMBER_OF_USER_THREADS, nThreads);
Status = DftiCommitDescriptor(My_FFT_Handle)
!$OMP PARALLEL DO
Status = DftiComputeForward(My_FFT_Handle,fu(:,ic))
!$OMP END PARALLEL DO
The results of fu(:,2) is sometimes not correct. Whats wrong in the code?
The real in-place FFT specified by DftiCreateDescriptor(..., DFTI_REAL, 1, M) accepts input of M real values and needs space of M+2 real elements for output. That means the matrix should be declared/allocated thus:
double precision fu(M+2,2)
It seems there is no difference by allocating fu(M+2,2) in sequential code, but by doing this in parallel code, I can obtain the correct result.
By the way, the additional data in fu(M+1:M+2,:) are meaningless and they are there to just make the routine work, are they?