- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I'm having trouble with 2d fft using intel 2018 compiler. I tracked it down and reproduce the behavior in this simple code. The sample code applies forward and backward 2d fft, and is supposed to bring back the original value. But if I do
setenv MKL_DYNAMIC false
setenv OMP_NUM_THREADS 10
and run the code, the result is wrong.
Here is my sample code. Any insight is appreciated. Thank you.
ifort version: 18.0.2 20180210
MKL version: ics-2018.update.2/compilers_and_libraries_2018.2.199
Xin
Program fft2dcc
! to compile the code, do the following:
! ifort -c -DMKLI8 -qopenmp -I${MKLROOT}/include/intel64/ilp64 -O3 -o fft2dcc.o fft2dcc.F90
! ifort -DMKLI8 -static-intel -qopenmp -O3 -o fft2dcc fft2dcc.o -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a -Wl,--end-group
use MKL_DFTI
use omp_lib
implicit none
integer :: nx, ny, nw, nfreq, ier, iw, iy, ix
complex, allocatable, dimension(:,:,:) :: wave
integer :: nthreads
type(DFTI_DESCRIPTOR), pointer :: plan
real :: fscale, bscale
nx = 640
ny = 640
nw = 224
nthreads = omp_get_num_threads()
fscale = 1.0
bscale = 1.0/float(nx*ny)
ier = ccfft2d_plan(nx,ny,nx,fscale,bscale,plan,nthreads)
allocate(wave(nx,ny,nw))
!$omp parallel
!$omp master
print *, omp_get_num_threads()
!$omp end master
!$omp end parallel
! first touch memory
!$omp parallel do default(shared), private(ix,iy,iw)
do iw = 1, nw
do iy = 1, ny
do ix = 1, nx
wave(ix,iy,iw) = cmplx(0.0,0.0)
enddo
enddo
! inject source
wave(nx/2,ny/2,iw) = cmplx(1.0,0.0)
enddo
print *
print *, "Before", wave(318:322,320,1)
print *
!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(iw)
do iw=1,nw
call ccfft2d_exe(1,wave(1,1,iw),plan)
call ccfft2d_exe(-1,wave(1,1,iw),plan)
enddo
print *, "After", wave(318:322,320,1)
deallocate(wave)
contains
integer function ccfft2d_plan &
( n1, n2, ldn1, fscale, bscale, plan, nthreads )
use MKL_DFTI
implicit none
integer, intent(in), optional :: nthreads
integer, intent(in) :: n1, n2, ldn1
real, intent(in) :: fscale, bscale
type(DFTI_DESCRIPTOR), pointer :: plan
integer :: ier
#ifdef MKLI8
integer(kind=8) :: dim, nsize(2), strides(3), nthreads_in, status
#else
integer(kind=4) :: dim, nsize(2), strides(3), nthreads_in, status
#endif
ier = 0
!--- add some error checking for sizes
dim = 2
nsize(1) = n1
nsize(2) = n2
strides(1) = 0
strides(2) = 1
strides(3) = ldn1
status = DftiCreateDescriptor(plan,DFTI_SINGLE,DFTI_COMPLEX,dim,nsize(1:dim))
if (.NOT. DftiErrorClass(Status, DFTI_NO_ERROR)) then
print *, 'error 1'
call DftiStatusPrint(Status)
ier = -1
return
endif
nthreads_in = 1
if (present(nthreads)) nthreads_in = nthreads
status = DftiSetValue(plan,DFTI_NUMBER_OF_USER_THREADS,nthreads_in)
status = DftiSetValue(plan,DFTI_INPUT_STRIDES,strides)
!--- this corresponding to isign < 0 usage at BP
! bscale= 1.0/float(n1*n2)
status = DftiSetValue(plan,DFTI_FORWARD_SCALE,bscale)
!--- this corresponding to isign > 0 usage at BP
! fscale= 1.0
status = DftiSetValue(plan,DFTI_BACKWARD_SCALE,fscale)
status = DftiCommitDescriptor(plan)
if (status /= 0) print *, 'DftiCommitDescriptor returned ', status
ccfft2d_plan = ier
end function ccfft2d_plan
subroutine ccfft2d_exe( isign, cdata, plan )
use MKL_DFTI
implicit none
integer, intent(in) :: isign
complex, intent(inout) :: cdata(*)
type(DFTI_DESCRIPTOR), pointer :: plan
#ifdef MKLI8
integer(kind=8) :: status
#else
integer(kind=4) :: status
#endif
if ( isign < 0 ) then
status = DftiComputeForward( plan, cdata )
else
status = DftiComputeBackward( plan, cdata )
endif
end subroutine ccfft2d_exe
subroutine DftiStatusPrint(status)
use MKL_DFTI
implicit none
#ifdef MKLI8
integer(kind=8) :: status
#else
integer(kind=4) :: status
#endif
character(DFTI_MAX_MESSAGE_LENGTH) :: error_message
error_message = DftiErrorMessage(status)
print *, 'Error message: ', error_message
print *, 'Error status = ', status
end subroutine DftiStatusPrint
end Program
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried to compile your code but I received compilation errors with the first line of your proposed compilation. Other people may be able to help you. Here is your organized code for better interpretation:
Program fft2dcc ! to compile the code, do the following: ! ifort -c -DMKLI8 -qopenmp -I${MKLROOT}/include/intel64/ilp64 -O3 -o fft2dcc.o fft2dcc.F90 ! ifort -DMKLI8 -static-intel -qopenmp -O3 -o fft2dcc fft2dcc.o -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a -Wl,--end-group use MKL_DFTI use omp_lib implicit none integer :: nx, ny, nw, nfreq, ier, iw, iy, ix complex, allocatable, dimension(:,:,:) :: wave integer :: nthreads type(DFTI_DESCRIPTOR), pointer :: plan real :: fscale, bscale nx = 640 ny = 640 nw = 224 nthreads = omp_get_num_threads() fscale = 1.0 bscale = 1.0/float(nx*ny) ier = ccfft2d_plan(nx,ny,nx,fscale,bscale,plan,nthreads) allocate(wave(nx,ny,nw)) !$omp parallel !$omp master print *, omp_get_num_threads() !$omp end master !$omp end parallel ! first touch memory !$omp parallel do default(shared), private(ix,iy,iw) do iw = 1, nw do iy = 1, ny do ix = 1, nx wave(ix,iy,iw) = cmplx(0.0,0.0) enddo enddo ! inject source wave(nx/2,ny/2,iw) = cmplx(1.0,0.0) enddo print * print *, "Before", wave(318:322,320,1) print * !$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(iw) do iw=1,nw call ccfft2d_exe(1,wave(1,1,iw),plan) call ccfft2d_exe(-1,wave(1,1,iw),plan) enddo print *, "After", wave(318:322,320,1) deallocate(wave) contains integer function ccfft2d_plan & ( n1, n2, ldn1, fscale, bscale, plan, nthreads ) use MKL_DFTI implicit none integer, intent(in), optional :: nthreads integer, intent(in) :: n1, n2, ldn1 real, intent(in) :: fscale, bscale type(DFTI_DESCRIPTOR), pointer :: plan integer :: ier #ifdef MKLI8 integer(kind=8) :: dim, nsize(2), strides(3), nthreads_in, status #else integer(kind=4) :: dim, nsize(2), strides(3), nthreads_in, status #endif ier = 0 !--- add some error checking for sizes dim = 2 nsize(1) = n1 nsize(2) = n2 strides(1) = 0 strides(2) = 1 strides(3) = ldn1 status = DftiCreateDescriptor(plan,DFTI_SINGLE,DFTI_COMPLEX,dim,nsize(1:dim)) if (.NOT. DftiErrorClass(Status, DFTI_NO_ERROR)) then print *, 'error 1' call DftiStatusPrint(Status) ier = -1 return endif nthreads_in = 1 if (present(nthreads)) nthreads_in = nthreads status = DftiSetValue(plan,DFTI_NUMBER_OF_USER_THREADS,nthreads_in) status = DftiSetValue(plan,DFTI_INPUT_STRIDES,strides) !--- this corresponding to isign < 0 usage at BP ! bscale= 1.0/float(n1*n2) status = DftiSetValue(plan,DFTI_FORWARD_SCALE,bscale) !--- this corresponding to isign > 0 usage at BP ! fscale= 1.0 status = DftiSetValue(plan,DFTI_BACKWARD_SCALE,fscale) status = DftiCommitDescriptor(plan) if (status /= 0) print *, 'DftiCommitDescriptor returned ', status ccfft2d_plan = ier end function ccfft2d_plan subroutine ccfft2d_exe( isign, cdata, plan ) use MKL_DFTI implicit none integer, intent(in) :: isign complex, intent(inout) :: cdata(*) type(DFTI_DESCRIPTOR), pointer :: plan #ifdef MKLI8 integer(kind=8) :: status #else integer(kind=4) :: status #endif if ( isign < 0 ) then status = DftiComputeForward( plan, cdata ) else status = DftiComputeBackward( plan, cdata ) endif end subroutine ccfft2d_exe subroutine DftiStatusPrint(status) use MKL_DFTI implicit none #ifdef MKLI8 integer(kind=8) :: status #else integer(kind=4) :: status #endif character(DFTI_MAX_MESSAGE_LENGTH) :: error_message error_message = DftiErrorMessage(status) print *, 'Error message: ', error_message print *, 'Error status = ', status end subroutine DftiStatusPrint end Program
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Xin, Vahid,
Thank you for the reports.
I try to build the code and attach some trick at the end of message. Basically i can reproduce the problem Xin reported. .
It looks MKL_DYNAMIC=false or true problems,
Some Intel MKL routines may use fewer OpenMP threads than suggested by the threading controls if
either the underlying algorithms do not support the suggested number of OpenMP threads or the
routines perform better with fewer OpenMP threads because of lower OpenMP overhead and/or better
data locality. Set the MKL_DYNAMIC environment variable to FALSE or call mkl_set_dynamic(0) to
use the suggested number of OpenMP threads whenever the algorithms permit and regardless of
OpenMP overhead and data locality.
But as we discussed in https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/759099
The plan is not thead safe, it may need private for each thread (include openMP thread).
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
FYI, Here is my compile command:
source /opt/intel/compilers_and_libraries_2018.2.199/linux/bin/compilervars.sh intel64
ifort -module ./ -i8 -fpp -warn all -warn errors -qopenmp -c ${MKLROOT}/include/mkl_dfti.f90 -o mkl_dfti.o
ifort -DMKLI8 -static-intel -qopenmp -i8 -O3 -o fft2dcc FFT2dcc2.f90 -module ./ -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a -Wl,--end-group
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ying, thanks for the reply.
The MKL manual illustrates four ways to parallelize FFT application. The sample code I provided mimics one of the techniques. You can find the example in the manual by searching 'Using Parallel Mode with a Common Descriptor'
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Xin,
Right, the example 'Using Parallel Mode with a Common Descriptor' are supposed for that purpose. originally , i had supposed possible out-sync in forward and Backforward.
do iw=1,nw
call ccfft2d_exe(1,wave(1,1,iw),plan)
call ccfft2d_exe(-1,wave(1,1,iw),plan)
enddo
further test to separate them
!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(iw)
do iw=1,nw
call ccfft2d_exe(1,wave(1,1,iw),plan)
enddo
! !$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(iw)
do iw=1,nw
call ccfft2d_exe(-1,wave(1,1,iw),plan)
enddo
But the issue is still existing., if wit MKL_DYNAMIC=true, everything looks fine, if with MKL_DYNAMIC=false, then wrong result.
So the problem should not here , we will look into details about the implementation.
Best Regards,
Ying
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Xin,
Our developer team should fix the problem in MKL 2019 version. which is available in IRC or YUM, APT now.
Could you please to try it and let us know if any feedback.
Thanks
Ying
P.S How to get Intel MKL etc:
https://software.intel.com/en-us/articles/how-to-get-intel-mklippdaal

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page