Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
28895 Discussions

DftiCommitDescriptor fails under Intel-2024

roryjohnston
Beginner
977 Views

Hello,

I've come across a strange error where DftiCommitDescriptor fails using Intel-2024 using a certain compilation syntax.

This compilation syntax worked with the Intel-2021, but now leads to a failure when compiling with the Intel-2024 software stack.

Please see the following reproducer:

```fortran90

! fft_reproducer.f90

program fft_reproducer

use MKL_DFTI
implicit none

type(DFTI_DESCRIPTOR), pointer :: plan
integer ier
#ifdef MKLI8
integer(kind=8) :: dim, len, num, dist_in, dist_out, strides_in(2), strides_out(2), status
#else
integer(kind=4) :: dim, len, num, dist_in, dist_out, strides_in(2), strides_out(2), status
#endif

! Initialize parameters
num = 10
dim = 1
len = 256
dist_in = 258 ! 2 * (len / 2 + 1)
dist_out = dist_in ! Same distance for input and output (for in-place transform)
strides_in = (/0, 1/)
strides_out = (/0, 1/)

print *, "len=", len
print *, "num=", num
print *, "dist_in=", dist_in
print *, "dist_out=", dist_out
print *, "strides_in=", strides_in
print *, "strides_out=", strides_out

! Create descriptor for 1D real-to-complex FFT
status = DftiCreateDescriptor(plan, DFTI_SINGLE, DFTI_REAL, dim, len)
if (status /= 0) then
print *, "Error in DftiCreateDescriptor:", status
call exit(status)
endif

! Set FFT options
status = DftiSetValue(plan, DFTI_PLACEMENT, DFTI_INPLACE)
status = DftiSetValue(plan, DFTI_NUMBER_OF_TRANSFORMS, num)

status = DftiSetValue(plan, DFTI_INPUT_DISTANCE, dist_in)
status = DftiSetValue(plan, DFTI_OUTPUT_DISTANCE, dist_out)

status = DftiSetValue(plan, DFTI_INPUT_STRIDES, strides_in)
status = DftiSetValue(plan, DFTI_OUTPUT_STRIDES, strides_out)

! Commit the descriptor
status = DftiCommitDescriptor(plan)
if (status /= 0) then
print *, "Error in DftiCommitDescriptor:", status
call exit(status)
endif

! Deallocate resources and exit
status = DftiFreeDescriptor(plan)

print *, "My FFT setup complete!"

end program fft_reproducer
```

Compiled with the following Makefile:

```make
TARGET = fft_reproducer
SRC = fft_reproducer.f90
 
INCLUDE_FLAGS = -I${MKLROOT}/include/intel64/ilp64

ifdef OLD
    ##### OLD STYLE #####
    FCFLAGS = -qopenmp -Wl,--start-group \
    ${MKLROOT}/lib/intel64/libmkl_blas95_ilp64.a \
    ${MKLROOT}/lib/intel64/libmkl_lapack95_ilp64.a \
    ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a \
    ${MKLROOT}/lib/intel64/libmkl_core.a \
    ${MKLROOT}/lib/intel64/libmkl_intel_thread.a \
    -Wl,--end-group
else ifdef NEW
    ##### NEW STYLE #####
    FCFLAGS = -qmkl
endif

$(TARGET): $(SRC)
    ${FC} -fpp -DMKLI8 $(INCLUDE_FLAGS) -o $(TARGET) $(SRC) $(FCFLAGS)

clean:
rm -f $(TARGET)
```

If I do:

```
module load intel/intel-2021
export FC=ifort
make clean; make OLD=x; ./fft_reproducer      # compiles, runs successfully
make clean; make NEW=x; ./fft_reproducer      # compiles, runs successfully
```

However, if I do:
```
module load intel/intel-2024
export FC=ifx
make clean; make OLD=x; ./fft_reproducer      # compiles, fails at DftiCommitDescriptor with status 3 (invalid configuration)
make clean; make NEW=x; ./fft_reproducer      # compiles, runs successfully
```

It's worth also mentioning that this is a small section of my code (which is heavily reliant upon FFT); although it passes the DftiCommitDescriptor with the "NEW" style of compilation using Intel-2024, the final result is incorrect (looks like indexing errors).

However, when I revert back to using Intel-2021 and either the "OLD" or "NEW" styles of compilation, the DftiCommitDescriptor passes and the final result is correct. I'm certain that it is the FFT routine that is at fault because I replaced MKL FFT with FFTW and the final result is also correct.

If someone can please explain this behaviour, I'd appreciate it.

Edit:
`module load intel/intel-2021` will point MKLROOT to `.../oneapi-2021.update.4/mkl/2021.4.0`
`module load intel/intel-2024` will point MKLROOT to `.../oneapi-2024.update.1/mkl/2024.1`

0 Kudos
7 Replies
gqchenATintel
Employee
885 Views

@roryjohnston 

 

 Please update your reproducer

! Since oneAPI 2023.0, the output for complex has changed from complex_real to complex_complex, which leads to the half size of the input dimension  
 ! dist_out  =  dist_in     
 dist_out  =  dist_in/2  
 
 
  In your makefile, 
 
# FCFLAGS = -qmkl   #old
FCFLAGS = -qmkl-ilp64=parallel  ! force it to use 64bit MKL lib

   Now it should work for Intel-2024.  Please report back with your updates.
0 Kudos
roryjohnston
Beginner
848 Views

@gqchenATintel, thank you very much - this fixed my issue:

dist_out = dist_in / 2 ! halved distance from input (or in-place transform)
 

```
$ module load intel/intel-2024; export FC=ifx; make clean; make OLD=x; ./fft_reproducer;
rm -f fft_reproducer
ifx -fpp -DMKLI8 -I.../oneapi-2024.update.1/mkl/2024.1/include/intel64/ilp64 -o fft_reproducer fft_reproducer.f90 -qopenmp -Wl,--start-group .../oneapi-2024.update.1/mkl/2024.1/lib/intel64/libmkl_blas95_ilp64.a .../oneapi-2024.update.1/mkl/2024.1/lib/intel64/libmkl_lapack95_ilp64.a .../oneapi-2024.update.1/mkl/2024.1/lib/intel64/libmkl_intel_ilp64.a .../oneapi-2024.update.1/mkl/2024.1/lib/intel64/libmkl_core.a .../oneapi-2024.update.1/mkl/2024.1/lib/intel64/libmkl_intel_thread.a -Wl,--end-group
len= 256
num= 10
dist_in= 258
dist_out= 129
strides_in= 0 1
strides_out= 0 1

My FFT setup complete!
```

And:

```
$ module load intel/intel-2024; export FC=ifx; make clean; make NEW=x; ./fft_reproducer;
rm -f fft_reproducer
ifx -fpp -DMKLI8 -I.../oneapi-2024.update.1/mkl/2024.1/include/intel64/ilp64 -o fft_reproducer fft_reproducer.f90 -qmkl-ilp64=parallel
len= 256
num= 10
dist_in= 258
dist_out= 129
strides_in= 0 1
strides_out= 0 1
My FFT setup complete!
```

0 Kudos
roryjohnston
Beginner
838 Views

@gqchenATintel is there any way to keep the original behaviour? That would presumably mean I'd have to change my indexing for all of the functions that use the output complex-valued array, which I'd prefer to avoid.

0 Kudos
gqchenATintel
Employee
820 Views

Unfortunately, the answer is no, because of the output type change from complex_real to complex_complex (since oneAPI 2023.0).

0 Kudos
roryjohnston
Beginner
772 Views

@gqchenATintel: I managed to get the same behaviour in both APIs by including the following:

status = DftiSetValue(plan, DFTI_PLACEMENT, DFTI_INPLACE)
status = DftiSetValue(plan,DFTI_CONJUGATE_EVEN_STORAGE,DFTI_COMPLEX_REAL)
status = DftiSetValue(plan,DFTI_PACKED_FORMAT,DFTI_CCS_FORMAT)


 Per this example, compiled using your suggested `-qmkl-ilp64=parallel`:

 

program fft_reproducer
    use MKL_DFTI
    implicit none

    type(DFTI_DESCRIPTOR), pointer :: plan
    integer ier
#ifdef MKLI8
    integer(kind=8) :: dim, len, num, dist_in, dist_out, strides_in(2), strides_out(2), status
    integer(kind=8) :: i, j
#else
    integer(kind=4) :: dim, len, num, dist_in, dist_out, strides_in(2), strides_out(2), status
    integer(kind=4) :: i, j
#endif
    real(kind=4), allocatable :: x(:), y(:)
    real(kind=4) :: max_diff, threshold
    
    ! Initialize parameters
    num = 2
    dim = 1
    len = 16
    dist_in = 2*(len/2+1)  ! 2 * (len / 2 + 1)
    dist_out = dist_in
    strides_in = (/0, 1/)
    strides_out = (/0, 1/)

    ! Allocate and initialize test x
    allocate(x(dist_in*num))
    allocate(y(dist_in*num))
    
    ! Initialize with a simple sinusoidal pattern
    do j = 1, num
        do i = 1, len
            x((j-1)*dist_in + i) = sin(2.0 * 3.14159 * real(i-1) / real(len))
        end do
        do i = len+1, dist_in
            x((j-1)*dist_in + i) = 0.0
        end do
    end do
    
    ! Save original x for comparison
    y(:) = x(:)

    print *, "Parameters:"
    print *, "len=", len
    print *, "num=", num
    print *, "dist_in=", dist_in
    print *, "dist_out=", dist_out
    print *, "strides_in=", strides_in
    print *, "strides_out=", strides_out

    ! Create descriptor for 1D real-to-complex FFT
    status = DftiCreateDescriptor(plan, DFTI_SINGLE, DFTI_REAL, dim, len)
    if (status /= 0) then
        print *, "Error in DftiCreateDescriptor:", status
        call exit(status)
    endif

    ! Set FFT options
    status = DftiSetValue(plan, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_REAL)
    status = DftiSetValue(plan, DFTI_PACKED_FORMAT, DFTI_CCS_FORMAT)

    status = DftiSetValue(plan, DFTI_PLACEMENT, DFTI_INPLACE)
    status = DftiSetValue(plan, DFTI_NUMBER_OF_TRANSFORMS, num)

    status = DftiSetValue(plan, DFTI_INPUT_DISTANCE, dist_in)
    status = DftiSetValue(plan, DFTI_OUTPUT_DISTANCE, dist_out)
    status = DftiSetValue(plan, DFTI_INPUT_STRIDES, strides_in)
    status = DftiSetValue(plan, DFTI_OUTPUT_STRIDES, strides_out)

    ! Scale factor for backward transform
    status = DftiSetValue(plan,DFTI_FORWARD_SCALE,real(1.0))
    status = DftiSetValue(plan, DFTI_BACKWARD_SCALE, 1.0/real(len))

    ! Commit the descriptor
    status = DftiCommitDescriptor(plan)
    if (status /= 0) then
        print *, "Error in DftiCommitDescriptor:", status
        call exit(status)
    endif

    ! Perform forward transform
    print *, "Performing forward FFT..."
    status = DftiComputeForward(plan, x)
    if (status /= 0) then
        print *, "Error in forward transform:", status
        call exit(status)
    endif

    ! Perform backward transform
    print *, "Performing backward FFT..."
    status = DftiComputeBackward(plan, x)
    if (status /= 0) then
        print *, "Error in backward transform:", status
        call exit(status)
    endif

    ! Verify results
    max_diff = 0.0
    threshold = 1.0e-5  ! Adjust based on your precision requirements
    
    print *, "=================================="
    do j = 1, num
        do i = 1, len
            print *, x((j-1)*dist_in + i), y((j-1)*dist_in + i)
            max_diff = max(max_diff, abs(x((j-1)*len + i) - y((j-1)*len + i)))
        end do
        print *, "=================================="
    end do

    print *, "Maximum difference between original and reconstructed x:", max_diff
    if (max_diff < threshold) then
        print *, "FFT validation PASSED!"
    else
        print *, "FFT validation FAILED! Difference exceeds threshold of", threshold
    end if

    ! Deallocate resources
    status = DftiFreeDescriptor(plan)
    deallocate(x)
    deallocate(y)

end program fft_reproducer

 

 

0 Kudos
gqchenATintel
Employee
724 Views

@roryjohnston  Excellent, it works for you.

0 Kudos
Kinmasterproapk
Beginner
708 Views

It seems like you've pinpointed an interesting issue with the Intel-2024 stack and MKL, especially given that the compilation works fine with Intel-2021 and also with FFTW as an alternative. It may be worth double-checking MKL’s updated documentation to see if Intel-2024 has specific handling or changed support for older configurations in the DFTI descriptor. You might also consider reaching out to Intel’s support to see if there’s a compatibility issue or if any known issues with exist in the newer stack.

0 Kudos
Reply