Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

issue of 2d fft with openmp

Wang__Xin
Beginner
657 Views

Hi, I'm having trouble with 2d fft using intel 2018 compiler. I tracked it down and reproduce the behavior in this simple code. The sample code applies forward and backward 2d fft, and is supposed to bring back the original value. But if I do

setenv MKL_DYNAMIC false 

setenv OMP_NUM_THREADS 10

and run the code, the result is wrong.

Here is my sample code. Any insight is appreciated. Thank you.

ifort version: 18.0.2 20180210

MKL version: ics-2018.update.2/compilers_and_libraries_2018.2.199

Xin

 

 

 

Program fft2dcc

 

! to compile the code, do the following:

! ifort -c -DMKLI8 -qopenmp -I${MKLROOT}/include/intel64/ilp64 -O3 -o fft2dcc.o fft2dcc.F90

! ifort -DMKLI8 -static-intel -qopenmp -O3 -o fft2dcc fft2dcc.o  -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a -Wl,--end-group

 

use MKL_DFTI

use omp_lib

 

implicit none

 

integer :: nx, ny, nw, nfreq, ier, iw, iy, ix

complex, allocatable, dimension(:,:,:) :: wave

 

integer :: nthreads

type(DFTI_DESCRIPTOR), pointer :: plan

real :: fscale, bscale

 

nx = 640

ny = 640

nw = 224

 

nthreads = omp_get_num_threads()

fscale = 1.0

bscale = 1.0/float(nx*ny)

ier = ccfft2d_plan(nx,ny,nx,fscale,bscale,plan,nthreads)

 

allocate(wave(nx,ny,nw))

!$omp parallel

!$omp master

 print *, omp_get_num_threads()

!$omp end master

!$omp end parallel

 

! first touch memory

!$omp parallel do default(shared), private(ix,iy,iw)

do iw = 1, nw

   do iy = 1, ny

   do ix = 1, nx

      wave(ix,iy,iw) = cmplx(0.0,0.0)

   enddo

   enddo

         ! inject source

   wave(nx/2,ny/2,iw) = cmplx(1.0,0.0)

enddo

 

print *

print *, "Before", wave(318:322,320,1)

print *

 

!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(iw)

do iw=1,nw

   call ccfft2d_exe(1,wave(1,1,iw),plan)

   call ccfft2d_exe(-1,wave(1,1,iw),plan)

enddo

 

print *, "After", wave(318:322,320,1)

 

deallocate(wave)

 

contains

 

integer function ccfft2d_plan &

   ( n1, n2, ldn1, fscale, bscale, plan, nthreads )

 

use MKL_DFTI

implicit none

 

integer, intent(in), optional :: nthreads

integer, intent(in)  :: n1, n2, ldn1

real,    intent(in)  :: fscale, bscale

type(DFTI_DESCRIPTOR), pointer :: plan

 

integer :: ier

#ifdef MKLI8

integer(kind=8) :: dim, nsize(2), strides(3), nthreads_in, status

#else

integer(kind=4) :: dim, nsize(2), strides(3), nthreads_in, status

#endif

 

ier = 0

!--- add some error checking for sizes

 

dim = 2

 

nsize(1) = n1

nsize(2) = n2

 

strides(1) = 0

strides(2) = 1

strides(3) = ldn1

 

status = DftiCreateDescriptor(plan,DFTI_SINGLE,DFTI_COMPLEX,dim,nsize(1:dim))

if (.NOT. DftiErrorClass(Status, DFTI_NO_ERROR)) then

print *, 'error 1'

   call DftiStatusPrint(Status)

   ier = -1

   return

endif

 

nthreads_in = 1

if (present(nthreads)) nthreads_in = nthreads

status = DftiSetValue(plan,DFTI_NUMBER_OF_USER_THREADS,nthreads_in)

 

status = DftiSetValue(plan,DFTI_INPUT_STRIDES,strides)

 

!--- this corresponding to isign < 0 usage at BP

! bscale= 1.0/float(n1*n2)

status = DftiSetValue(plan,DFTI_FORWARD_SCALE,bscale)

 

!--- this corresponding to isign > 0 usage at BP

! fscale= 1.0

status = DftiSetValue(plan,DFTI_BACKWARD_SCALE,fscale)

 

status = DftiCommitDescriptor(plan)

if (status /= 0) print *, 'DftiCommitDescriptor returned ', status

 

ccfft2d_plan = ier

 

end function ccfft2d_plan

 

subroutine ccfft2d_exe( isign, cdata, plan )

 

use MKL_DFTI

implicit none

 

integer, intent(in)  :: isign

complex, intent(inout) :: cdata(*)

type(DFTI_DESCRIPTOR), pointer :: plan

 

#ifdef MKLI8

integer(kind=8) :: status

#else

integer(kind=4) :: status

#endif

 

if ( isign < 0 ) then

   status = DftiComputeForward( plan, cdata )

else

   status = DftiComputeBackward( plan, cdata )

endif

 

end subroutine ccfft2d_exe

 

subroutine DftiStatusPrint(status)

 

use MKL_DFTI

implicit none

 

#ifdef MKLI8

integer(kind=8) :: status

#else

integer(kind=4) :: status

#endif

character(DFTI_MAX_MESSAGE_LENGTH) :: error_message

 

error_message = DftiErrorMessage(status)

print *, 'Error message: ', error_message

print *, 'Error status = ', status

 

end subroutine DftiStatusPrint

 

end Program

 

 

 

0 Kudos
6 Replies
vahid_a_
Novice
657 Views

I tried to compile your code but I received compilation errors with the first line of your proposed compilation. Other people may be able to help you. Here is your organized code for better interpretation:

Program fft2dcc

    ! to compile the code, do the following:
    ! ifort -c -DMKLI8 -qopenmp -I${MKLROOT}/include/intel64/ilp64 -O3 -o fft2dcc.o fft2dcc.F90
    ! ifort -DMKLI8 -static-intel -qopenmp -O3 -o fft2dcc fft2dcc.o  -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a -Wl,--end-group

    use MKL_DFTI
    use omp_lib

    implicit none

    integer :: nx, ny, nw, nfreq, ier, iw, iy, ix
    complex, allocatable, dimension(:,:,:) :: wave
    integer :: nthreads
    type(DFTI_DESCRIPTOR), pointer :: plan
    real :: fscale, bscale

    nx = 640
    ny = 640
    nw = 224

    nthreads = omp_get_num_threads()
    fscale = 1.0
    bscale = 1.0/float(nx*ny)
    ier = ccfft2d_plan(nx,ny,nx,fscale,bscale,plan,nthreads)

    allocate(wave(nx,ny,nw))

    !$omp parallel
    !$omp master

    print *, omp_get_num_threads()

    !$omp end master
    !$omp end parallel

    ! first touch memory
    !$omp parallel do default(shared), private(ix,iy,iw)

    do iw = 1, nw
        do iy = 1, ny
            do ix = 1, nx
                wave(ix,iy,iw) = cmplx(0.0,0.0)
            enddo
        enddo
              ! inject source
        wave(nx/2,ny/2,iw) = cmplx(1.0,0.0)
    enddo

 

    print *
    print *, "Before", wave(318:322,320,1)
    print *

    !$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(iw)

    do iw=1,nw
        call ccfft2d_exe(1,wave(1,1,iw),plan)
        call ccfft2d_exe(-1,wave(1,1,iw),plan)
    enddo

    print *, "After", wave(318:322,320,1)

    deallocate(wave)

contains

    integer function ccfft2d_plan &
    ( n1, n2, ldn1, fscale, bscale, plan, nthreads )

        use MKL_DFTI
        implicit none

        integer, intent(in), optional :: nthreads
        integer, intent(in)  :: n1, n2, ldn1
        real,    intent(in)  :: fscale, bscale
        type(DFTI_DESCRIPTOR), pointer :: plan

        integer :: ier

#ifdef MKLI8

        integer(kind=8) :: dim, nsize(2), strides(3), nthreads_in, status

#else

        integer(kind=4) :: dim, nsize(2), strides(3), nthreads_in, status

#endif

 

        ier = 0
        !--- add some error checking for sizes
        dim = 2

        nsize(1) = n1
        nsize(2) = n2
        strides(1) = 0
        strides(2) = 1
        strides(3) = ldn1

        status = DftiCreateDescriptor(plan,DFTI_SINGLE,DFTI_COMPLEX,dim,nsize(1:dim))

        if (.NOT. DftiErrorClass(Status, DFTI_NO_ERROR)) then

            print *, 'error 1'
            call DftiStatusPrint(Status)
            ier = -1
            return

        endif


        nthreads_in = 1
        if (present(nthreads)) nthreads_in = nthreads
        status = DftiSetValue(plan,DFTI_NUMBER_OF_USER_THREADS,nthreads_in)
        status = DftiSetValue(plan,DFTI_INPUT_STRIDES,strides)

        !--- this corresponding to isign < 0 usage at BP
        ! bscale= 1.0/float(n1*n2)
        status = DftiSetValue(plan,DFTI_FORWARD_SCALE,bscale)
        !--- this corresponding to isign > 0 usage at BP
        ! fscale= 1.0
        status = DftiSetValue(plan,DFTI_BACKWARD_SCALE,fscale)
        status = DftiCommitDescriptor(plan)

        if (status /= 0) print *, 'DftiCommitDescriptor returned ', status

        ccfft2d_plan = ier

    end function ccfft2d_plan
 
    subroutine ccfft2d_exe( isign, cdata, plan )

        use MKL_DFTI
        implicit none
        integer, intent(in)  :: isign
        complex, intent(inout) :: cdata(*)
        type(DFTI_DESCRIPTOR), pointer :: plan


#ifdef MKLI8

        integer(kind=8) :: status

#else

        integer(kind=4) :: status

#endif

        if ( isign < 0 ) then
            status = DftiComputeForward( plan, cdata )
        else
            status = DftiComputeBackward( plan, cdata )
        endif

    end subroutine ccfft2d_exe

    subroutine DftiStatusPrint(status)

        use MKL_DFTI
        implicit none

#ifdef MKLI8

        integer(kind=8) :: status

#else

        integer(kind=4) :: status

#endif

        character(DFTI_MAX_MESSAGE_LENGTH) :: error_message

 

        error_message = DftiErrorMessage(status)
        print *, 'Error message: ', error_message
        print *, 'Error status = ', status

    end subroutine DftiStatusPrint

end Program

 

0 Kudos
Ying_H_Intel
Employee
657 Views

Hi Xin, Vahid,
​Thank you for the reports. 

I try to build the code and attach some trick at the end of message.  Basically i can reproduce the  problem Xin reported. .
​It looks MKL_DYNAMIC=false or true problems,

Some Intel MKL routines may use fewer OpenMP threads than suggested by the threading controls if
either the underlying algorithms do not support the suggested number of OpenMP threads or the
routines perform better with fewer OpenMP threads because of lower OpenMP overhead and/or better
data locality. Set the
MKL_DYNAMIC environment variable to FALSE or call mkl_set_dynamic(0) to
use the suggested number of OpenMP threads whenever the algorithms permit and regardless of
OpenMP overhead and data locality.

 

But as we discussed in  https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/759099

​The plan is not thead safe, it may need private for each thread (include openMP thread).

Best Regards,
​Ying

0 Kudos
Ying_H_Intel
Employee
657 Views

FYI, Here is my compile command:   

source /opt/intel/compilers_and_libraries_2018.2.199/linux/bin/compilervars.sh intel64

ifort -module ./ -i8 -fpp -warn all -warn errors  -qopenmp -c ${MKLROOT}/include/mkl_dfti.f90 -o mkl_dfti.o

ifort -DMKLI8 -static-intel -qopenmp -i8 -O3 -o fft2dcc FFT2dcc2.f90 -module ./   -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a -Wl,--end-group
 

 


 

 

0 Kudos
Wang__Xin
Beginner
657 Views

Ying, thanks for the reply. 

The MKL manual illustrates four ways to parallelize FFT application. The sample code I provided mimics one of the techniques. You can find the example in the manual by searching 'Using Parallel Mode with a Common Descriptor'  

0 Kudos
Ying_H_Intel
Employee
657 Views

 

Hi Xin,

​Right,  the example  'Using Parallel Mode with a Common Descriptor' are supposed for that purpose.   originally ,  i  had supposed possible out-sync in forward and Backforward.   

  do iw=1,nw
        call ccfft2d_exe(1,wave(1,1,iw),plan)
        call ccfft2d_exe(-1,wave(1,1,iw),plan)
    enddo
​further test to separate them

!$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(iw)
    do iw=1,nw
        call ccfft2d_exe(1,wave(1,1,iw),plan)
       
    enddo
 
 !   !$OMP PARALLEL DO DEFAULT(SHARED), PRIVATE(iw)
    do iw=1,nw
 call ccfft2d_exe(-1,wave(1,1,iw),plan)
     enddo​
But the issue is still existing., if wit MKL_DYNAMIC=true, everything looks fine, if with MKL_DYNAMIC=false, then wrong result.

So the problem should  not here , we will look into details about the implementation.

​Best Regards,
Ying

 

 














 

 

 

 

 

 

0 Kudos
Ying_H_Intel
Employee
657 Views

Hi Xin, 

​Our developer team should  fix the problem in MKL 2019 version. which is available in IRC or YUM, APT now.

Could you please to try it and let us know if any feedback.

Thanks

Ying

P.S How to get Intel MKL etc:  

https://software.intel.com/en-us/articles/how-to-get-intel-mklippdaal

0 Kudos
Reply