- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've written an MPI code which is using the DFTI interface to compute FFT's . It's a domain decomposition type of problem
where each processor solves its own group of FFTs. Everything works fine for NP=1,2,32,64,128 but fails
when NP=256 with an error which looks like:
DFTI_MKL_INTERNAL_ERROR
The code I'm using is the same regardless of the number of processors (the FFT function itself is just called less often).
The code which fails is the commit descriptor line and it fails on the first instance of being called:
type(DFTI_DESCRIPTOR), POINTER :: DFTI_HANDLE
...
STATUS = DftiCreateDescriptor( DFTI_HANDLE, DFTI_DOUBLE, DFTI_COMPLEX, 1, 192)
STATUS = DftiCommitDescriptor( DFTI_HANDLE )
I've tried both statically and dynamically linking, neither help and I'm using the sequential (num threads = 1) version.
Static
-i_dynamic -lmkl_core -lmkl_sequential -lmkl_intel_lp64
Dynamic
#$MKLPATH/libmkl_solver_lp64_sequential.a -Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -Wl,--end-group -i_dynamic
Also, things mostly work o.k. for a smaller number of FFT points, e.g. 32, but it doesn't work for 192 or 256.
I've compiled with "-check all" and nothing is found...so I think the code is ok.
Does this problem sound familiar to anyone?
thx
jrd
where each processor solves its own group of FFTs. Everything works fine for NP=1,2,32,64,128 but fails
when NP=256 with an error which looks like:
DFTI_MKL_INTERNAL_ERROR
The code I'm using is the same regardless of the number of processors (the FFT function itself is just called less often).
The code which fails is the commit descriptor line and it fails on the first instance of being called:
type(DFTI_DESCRIPTOR), POINTER :: DFTI_HANDLE
...
STATUS = DftiCreateDescriptor( DFTI_HANDLE, DFTI_DOUBLE, DFTI_COMPLEX, 1, 192)
STATUS = DftiCommitDescriptor( DFTI_HANDLE )
I've tried both statically and dynamically linking, neither help and I'm using the sequential (num threads = 1) version.
Static
-i_dynamic -lmkl_core -lmkl_sequential -lmkl_intel_lp64
Dynamic
#$MKLPATH/libmkl_solver_lp64_sequential.a -Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -Wl,--end-group -i_dynamic
Also, things mostly work o.k. for a smaller number of FFT points, e.g. 32, but it doesn't work for 192 or 256.
I've compiled with "-check all" and nothing is found...so I think the code is ok.
Does this problem sound familiar to anyone?
thx
jrd
Link Copied
7 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Davis, what MKL and MPI versions you are using?
--Gennady
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Gennady Fedorov (Intel)
Davis, what MKL and MPI versions you are using?
--Gennady
ifort Intel Fortran Compiler for applications running on Intel 64, Version 10.1 Build 20080312 Package ID: l_fc_p_10.1.015
MKL 10.0.2.018
MPI mvapich_intel10-0.9.9 currently, but also tried openmpi_intel-1.2.7
Also, I stripped out everything in my program so that all it does it commit and then free the descriptor. This does work.
So it sort of looks like a stack limit size problem...within the shell my stack is unlimited...but perhaps there is
some environment stack variable that needs to be set...I tried setting KMP_STACKSIZE large per a previous post I saw:
KMP_STACKSIZE=10000000000
export KMP_STACKSIZE
but that did not help either.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Davis,
My question may seem strange to you but...
Are you sure all your MPI processes are actually run on their respective nodes?
To see the nodes on which you are actually running you may replace the name of your executable file with "uname -n".
Best regards,
-Vladimir
My question may seem strange to you but...
Are you sure all your MPI processes are actually run on their respective nodes?
To see the nodes on which you are actually running you may replace the name of your executable file with "uname -n".
Best regards,
-Vladimir
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Vladimir Petrov (Intel)
Davis,
My question may seem strange to you but...
Are you sure all your MPI processes are actually run on their respective nodes?
To see the nodes on which you are actually running you may replace the name of your executable file with "uname -n".
Best regards,
-Vladimir
My question may seem strange to you but...
Are you sure all your MPI processes are actually run on their respective nodes?
To see the nodes on which you are actually running you may replace the name of your executable file with "uname -n".
Best regards,
-Vladimir
I am already requesting that MPI provide the machine name, so I can check this fairly easily. For a 256 simulation, I am using
120 unique physical machines (4 cores per machine). Of those 120 machines, Of those 120:
1 core per machine 38
2 28
3 54
Is that what you were looking for?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Davis,
If you don't link with libiomp5 then perhaps setting KMP_STACKSIZE has no effect.
The version of MKL that you use has two memory leak problems in DFTI that are fixed in later releases. The problems may hypothetically cause DftiCommitDescriptor to produce DFTI_MKL_INTERNAL_ERROR in a long run or in a tight memory. The memory leak may only accumulate if DftiCreate/Commit/Compute/Free is called in a loop. If the descriptor is created a few times, then this likely is not the cause.
Thanks
Dima
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - Dmitry Baksheev (Intel)
Davis,
If you don't link with libiomp5 then perhaps setting KMP_STACKSIZE has no effect.
The version of MKL that you use has two memory leak problems in DFTI that are fixed in later releases. The problems may hypothetically cause DftiCommitDescriptor to produce DFTI_MKL_INTERNAL_ERROR in a long run or in a tight memory. The memory leak may only accumulate if DftiCreate/Commit/Compute/Free is called in a loop. If the descriptor is created a few times, then this likely is not the cause.
Thanks
Dima
The program is crashing on its first call to the Intel MKL libraries...so there is no loop to accumulate memory.
OK, I'll try upgrading MKL.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Quoting - davis@coastal.ufl.edu
Quoting - Dmitry Baksheev (Intel)
Davis,
If you don't link with libiomp5 then perhaps setting KMP_STACKSIZE has no effect.
The version of MKL that you use has two memory leak problems in DFTI that are fixed in later releases. The problems may hypothetically cause DftiCommitDescriptor to produce DFTI_MKL_INTERNAL_ERROR in a long run or in a tight memory. The memory leak may only accumulate if DftiCreate/Commit/Compute/Free is called in a loop. If the descriptor is created a few times, then this likely is not the cause.
Thanks
Dima
The program is crashing on its first call to the Intel MKL libraries...so there is no loop to accumulate memory.
OK, I'll try upgrading MKL.
Davis. please let us know the probelm will still with the new version.
--Gennady
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page