Using DFTI with larger numbers of processors

Justin_D_1 · ‎11-18-2009

I've written an MPI code which is using the DFTI interface to compute FFT's . It's a domain decomposition type of problem
where each processor solves its own group of FFTs. Everything works fine for NP=1,2,32,64,128 but fails
when NP=256 with an error which looks like:

DFTI_MKL_INTERNAL_ERROR

The code I'm using is the same regardless of the number of processors (the FFT function itself is just called less often).
The code which fails is the commit descriptor line and it fails on the first instance of being called:

type(DFTI_DESCRIPTOR), POINTER :: DFTI_HANDLE
...
STATUS = DftiCreateDescriptor( DFTI_HANDLE, DFTI_DOUBLE, DFTI_COMPLEX, 1, 192)
STATUS = DftiCommitDescriptor( DFTI_HANDLE )

I've tried both statically and dynamically linking, neither help and I'm using the sequential (num threads = 1) version.

Static
-i_dynamic -lmkl_core -lmkl_sequential -lmkl_intel_lp64

Dynamic
#$MKLPATH/libmkl_solver_lp64_sequential.a -Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -Wl,--end-group -i_dynamic

Also, things mostly work o.k. for a smaller number of FFT points, e.g. 32, but it doesn't work for 192 or 256.

I've compiled with "-check all" and nothing is found...so I think the code is ok.

Does this problem sound familiar to anyone?

thx

jrd

Gennady_F_Intel · ‎11-18-2009

Davis, what MKL and MPI versions you are using?
--Gennady

Justin_D_1 · ‎11-19-2009

Quoting - Gennady Fedorov (Intel)

Davis, what MKL and MPI versions you are using?
--Gennady

ifort Intel Fortran Compiler for applications running on Intel 64, Version 10.1 Build 20080312 Package ID: l_fc_p_10.1.015

MKL 10.0.2.018

MPI mvapich_intel10-0.9.9 currently, but also tried openmpi_intel-1.2.7

Also, I stripped out everything in my program so that all it does it commit and then free the descriptor. This does work.
So it sort of looks like a stack limit size problem...within the shell my stack is unlimited...but perhaps there is
some environment stack variable that needs to be set...I tried setting KMP_STACKSIZE large per a previous post I saw:

KMP_STACKSIZE=10000000000
export KMP_STACKSIZE

but that did not help either.

Vladimir_Petrov__Int · ‎11-20-2009

Davis,

My question may seem strange to you but...
Are you sure all your MPI processes are actually run on their respective nodes?
To see the nodes on which you are actually running you may replace the name of your executable file with "uname -n".

Best regards,
-Vladimir

Justin_D_1 · ‎11-20-2009

Quoting - Vladimir Petrov (Intel)

Davis,

My question may seem strange to you but...
Are you sure all your MPI processes are actually run on their respective nodes?
To see the nodes on which you are actually running you may replace the name of your executable file with "uname -n".

Best regards,
-Vladimir

I am already requesting that MPI provide the machine name, so I can check this fairly easily. For a 256 simulation, I am using
120 unique physical machines (4 cores per machine). Of those 120 machines, Of those 120:

1 core per machine 38
2 28
3 54

Is that what you were looking for?

Dmitry_B_Intel · ‎11-20-2009

Davis,

If you don't link with libiomp5 then perhaps setting KMP_STACKSIZE has no effect.

The version of MKL that you use has two memory leak problems in DFTI that are fixed in later releases. The problems may hypothetically cause DftiCommitDescriptor to produce DFTI_MKL_INTERNAL_ERROR in a long run or in a tight memory. The memory leak may only accumulate if DftiCreate/Commit/Compute/Free is called in a loop. If the descriptor is created a few times, then this likely is not the cause.

Thanks
Dima

Justin_D_1 · ‎11-20-2009

Quoting - Dmitry Baksheev (Intel)

Davis,

If you don't link with libiomp5 then perhaps setting KMP_STACKSIZE has no effect.

The version of MKL that you use has two memory leak problems in DFTI that are fixed in later releases. The problems may hypothetically cause DftiCommitDescriptor to produce DFTI_MKL_INTERNAL_ERROR in a long run or in a tight memory. The memory leak may only accumulate if DftiCreate/Commit/Compute/Free is called in a loop. If the descriptor is created a few times, then this likely is not the cause.

Thanks
Dima

The program is crashing on its first call to the Intel MKL libraries...so there is no loop to accumulate memory.

OK, I'll try upgrading MKL.

Gennady_F_Intel · ‎11-21-2009

Quoting - [email protected]

Quoting - Dmitry Baksheev (Intel)

Davis,

If you don't link with libiomp5 then perhaps setting KMP_STACKSIZE has no effect.

The version of MKL that you use has two memory leak problems in DFTI that are fixed in later releases. The problems may hypothetically cause DftiCommitDescriptor to produce DFTI_MKL_INTERNAL_ERROR in a long run or in a tight memory. The memory leak may only accumulate if DftiCreate/Commit/Compute/Free is called in a loop. If the descriptor is created a few times, then this likely is not the cause.

Thanks
Dima

The program is crashing on its first call to the Intel MKL libraries...so there is no loop to accumulate memory.

OK, I'll try upgrading MKL.

Davis. please let us know the probelm will still with the new version.
--Gennady