SIGSEGV when callin MKL FFT from python

itakatz · ‎09-22-2010

Hi all

I am using the fftw wrapper of the MKL library in a static library (let's call it mylib.lib) and I link it to a shared object (let's call it app.so). It works fine when I link to app.so from within a c++ executable, but when I link to the same app.so from python, I get a segmentation violation (SIGSEGV) from libmkl_def.so when calling fftwf_execute(..).

Any ideas?

I am using the 10.5.6 version of MKL, on Unix (Ubunto 9.1)

Thank you,

Ita.

Dmitry_B_Intel · ‎09-22-2010

Hi Ita

Intel MKL has a layered structure represented by several shared objects, one of them is libmkl_def.so. The layered structure requires appropriate linking. If not linked properly, application may showissues like the segv that you have described.It would be nice if you posted more details about how do you link your application. Ideally a self-contained example would help a lot.

Another point is thatMKL version 10.2 and later have FFTW3 wrappers integrated, so thatone may call FFTW functions directly.

Could you also check what MKL version do you use indeed, perhaps 10.2.6, because 10.5.6 is a strange number.

Thanks
Dima

itakatz · ‎09-26-2010

Hi Dima and thank you for your answer.

The version number is 10.2.6.038. Sorry for the mistake.
Regarding the linkage details:
- From python I link to a shared object using:
```
[bash]_libraries['libstaapp.so'] = CDLL('libstaapp.so')[/bash]
```
- From this shared object I link to a static library called 'stalib.a', in which the MKL functions are called (and the MKL h files are included), and to the following MKL shared objects:
  - mkl_lapack
  - mkl_intel_ilp64
  - mkl_core
  - mkl_sequential
  - mkl_mc3
  - mkl_def
I cannot at the moment supply a self-contained example, because I cannot share the source code. I will try to create such an example soon, but it will take me a few days to get into it.
I do not completely understand your remark about the FFTW3 wrapper, but anyway, I am using the FFTW3 wrapper.

As I mentioned, if I link to the 'libstaapp' shared object from a c++ executable instead from python, everything runs ok.

Thanks again,

Ita.

vector SoftmaxPredict::_probVals;

Gennady_F_Intel · ‎09-27-2010

Ita, could you please try to relink withmkl_intel_lp64 instead ofmkl_intel_ilp64.

itakatz · ‎09-28-2010

The same error occurs (at the same point), when linking with mkl_intel_lp64

itakatz · ‎10-03-2010

Some more information/thoughts:

The problem occurs on a computer where the python code is called locally. On two other computers (on which the MKL version is 10.2.5.035) where the python code is called from a web server, it works ok. I suspect there might be a version issue on the computer with the problem, maybe the user mixed .so objects from different versions.
I suspect the problem might be connected to a 16-bit memory alignment issue, which behaves differently when called from python.

Anyway, I am trying to reproduce the problem with a small project that I will be able to post here for diagnostics.

Dmitry_B_Intel · ‎10-04-2010

Hi Ita,

Suspecting SIMD alignment issue is good idea!Could you hintwhatFFT problem the failure occurs at?
Namely, what are precision, kind, dimension, sizes, placement of the transform that fails?
Getting this information may be easier than making a small project.

Thanks
Dima

itakatz · ‎10-06-2010

Hi Dima,

Actually I have some more information:

The problem seems to occur not only when calling from python (it was the case on a specific computer, but I reproduced it without any python code involved, only c++).
If an instance of the Unix machine is created from an image we have, then on some instances it happens and on some it does not, which makes me think it is a memory leak (could it still be the SIMD alignment...?)
It turns out that on this image (created by a coworker) the regular installation of the MKL was not called, instead the relevant .h and .so files were copied 'by hand'.

Anyway, to clear the details of use I attach the code of my FFT wrapper (class StaFFT) which calls the MKL functions (the wrapper allows for using the float or double versions just by changing the template parameter, and otherwise it is a 1D fft and ifft. The specialized implementation is class StaFFT_fn):

[cpp]#include "fftw3.h" //--- fftw wrapper for the mkl lib

//--- Complex data types
template  
struct ComplexT2;

template <> 
struct ComplexT2 { 
	typedef fftwf_complex Type;
};

template <> 
struct ComplexT2 {
	typedef fftw_complex Type;
};

template  
class StaFFT_fn;

template <> 
class StaFFT_fn {
public:
	typedef ComplexT2::Type ComplexT;
	typedef fftwf_plan plan;
	static void * malloc (size_t n) {
		return fftwf_malloc (n * sizeof(float));
	}
	static fftwf_plan fft_plan_r2c (int nfft, float *in, ComplexT * spec, unsigned int flag) { 
		return fftwf_plan_dft_r2c_1d (nfft, in, spec, flag);
	}
	static fftwf_plan fft_plan_c2r (int nfft, ComplexT * spec, float *in, unsigned int flag) {
		return fftwf_plan_dft_c2r_1d (nfft, spec, in, flag);
	}
	static void execute (const fftwf_plan p) {
		fftwf_execute (p);
	}
	static void free (void *p) {
		fftwf_free (p);	
	}
	static void destroy_plan (fftwf_plan p) {
		fftwf_destroy_plan (p);
	}
};

template <> 
class StaFFT_fn {
public:
	typedef ComplexT2::Type ComplexT;
	typedef fftw_plan plan;
	static void * malloc (size_t n) {
		return fftw_malloc (n * sizeof(double));
	}
	static fftw_plan fft_plan_r2c (int nfft, double *in, ComplexT * spec, unsigned int flag) { 
		return fftw_plan_dft_r2c_1d (nfft, in, spec, flag);
	}
	static fftw_plan fft_plan_c2r (int nfft, ComplexT * spec, double *in, unsigned int flag) {
		return fftw_plan_dft_c2r_1d (nfft, spec, in, flag);
	}
	static void execute (const fftw_plan p) {
		fftw_execute (p);
	}
	static void free (void *p) {
		fftw_free (p);	
	}
	static void destroy_plan (fftw_plan p) {
		fftw_destroy_plan (p);
	}
};

template  class StaFFT {
public:
	StaFFT () {
		tdata = 0;
		spec = 0;
		_plan_c2r = 0;
		_plan_r2c = 0;
	}
	~StaFFT () {
		free ();
	}
	typedef typename ComplexT2::Type ComplexT;				//--- complex data type
	typedef StaFFT_fn FFT;
	void alloc (size_t nfft) {		
		_nfft = nfft;
		int specLen = _nfft/2 + 1;
		free ();
		tdata = (T_Float *)FFT::malloc (_nfft);
		spec  = (ComplexT *)FFT::malloc (2*specLen);
		_plan_r2c = FFT::fft_plan_r2c (_nfft, tdata, spec, FFTW_ESTIMATE); // NOTE: mkl's fftw-wrapper ignores the FFTW_ flag
		_plan_c2r = FFT::fft_plan_c2r (_nfft, spec, tdata, FFTW_ESTIMATE);
	}
	void free (void) {
		FFT::free (tdata);
		tdata = 0;
		FFT::free (spec);
		spec = 0;
		if (_plan_c2r != 0) {
			FFT::destroy_plan (_plan_c2r);
			_plan_c2r = 0;
		}
		if (_plan_r2c != 0) {
			FFT::destroy_plan (_plan_r2c);
			_plan_r2c = 0;
		}
	}
	void execute_r2c () {FFT::execute (_plan_r2c);}
	void execute_c2r () {FFT::execute (_plan_c2r);}
	//--- public data members
	T_Float * tdata;			//--- time-domain data
	ComplexT * spec;		//--- spec-domain data
private:
	size_t _nfft;
	typename FFT::plan _plan_r2c;	//--- forward plan	(fft)
	typename FFT::plan _plan_c2r;	//--- backward plan (ifft)
};
[/cpp]

A typical use of the wrapper will look like:

[cpp]StaFFT fft;
float *x;
StaFFT::ComplexT *X;

fft.alloc (nFft);
x = fft.tdata;
X = fft.spec;

// [ copy some data into vector x ]
fft.execute_r2c ();
// [ some operations on the spectrum X]
fft.execute_c2r ();

fft.free ();[/cpp]

I hope it helps.

Dmitry_B_Intel · ‎10-06-2010

Hi Ita,

Thank you for the test. I played with it a lot using MKL 10.2.6 on an x86_64 system. I tried both float and double specializations of StaFFT, misaligned memory,variety of sizes, linking with libmkl_intel_lp64/ilp6, and yet I could not reproduce the problem. Neither SEGV, nor memory leaks.

Additionally, Intel MKL contains memory management software that speeds up memory allocations for the library. This memory manager is on by default and can be disabled by setting environment variable MKL_DISABLE_FAST_MM=1. You can find details in the MKL User's Guide. I have tried this control too, and still could not reproduce your issue.

If you manage to reproduce the SEGV with aC++ program, could you look atthe backtrace of the failure? ('gdb a.out' then 'run' then 'bt')

Thanks
Dima.

itakatz · ‎10-06-2010

Hi Dima,
I am pasting here the backtrace from gdb:

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff4155b0d in var000C () from /usr/lib/libmkl_def.so
(gdb) bt
#0 0x00007ffff4155b0d in var000C () from /usr/lib/libmkl_def.so
#1 0x0000000000947300 in ?? ()
#2 0x000000000093a1c0 in ?? ()
#3 0x000000000093e8e0 in ?? ()
#4 0x00007ffff7df35f5 in ?? () from /lib64/ld-linux-x86-64.so.2
#5 0x00007ffff4166907 in W6_ippsFFTFwd_RToPerm_32f () from /usr/lib/libmkl_def.so
#6 0x00007ffff4130769 in W6_ippsDFTFwd_RToCCS_32f () from /usr/lib/libmkl_def.so
#7 0x00007ffff53220ad in mkl_dft_xipps_fwd_rtocomplex_32f () from /usr/lib/libmkl_mc3.so
#8 0x00007ffff53099d4 in mkl_dft_compute_fwd_s_r2c_1d_o () from /usr/lib/libmkl_mc3.so
#9 0x00007ffff65e9763 in DftiComputeForward_1 () from /usr/lib/libmkl_intel_ilp64.so
#10 0x00007ffff65f15c7 in execute_fo () from /usr/lib/libmkl_intel_ilp64.so
#11 0x00007ffff65efd35 in fftwf_execute () from /usr/lib/libmkl_intel_ilp64.so
#12 0x00007ffff6932f24 in CenterCut::process(InStreamBuf&, InStreamBuf&, OutStreamBuf&) () from /usr/lib/libstaapp.so
#13 0x00007ffff6933863 in StereoSplit::process(InStreamBuf&, InStreamBuf&, OutStreamBuf&, OutStreamBuf&) () from /usr/lib/libstaapp.so
#14 0x00007ffff68f65fd in StereoSplitTransformer::process(InStreamBuf*) () from /usr/lib/libstaapp.so
#15 0x00007ffff68f9b8d in StreamProcNode::process(InStreamBuf*, unsigned long) () from /usr/lib/libstaapp.so
#16 0x00007ffff6901edd in StreamRootsProcessor::process(ChannelBuffer*, unsigned long) () from /usr/lib/libstaapp.so
#17 0x00007ffff68fee7b in StreamProcessor::process(float const*, unsigned long) () from /usr/lib/libstaapp.so
#18 0x0000000000406ef9 in main ()

Thanks,

Ita.

p.s I've installed the 10.2.6 version using the install script, so it is not an issue of mis-installed files or something).

Dmitry_B_Intel · ‎10-06-2010

Hi Ita,

Looks like the problem comes from libstaapp.so. That lines #6 and #7 come from different .so files should not happen. When you link libstaapp.so, do you really need to link in libmkl_def.so and libmkl_mc3.so? I suggest you drop them from the build oflibstaapp.so.

Thanks
Dima

itakatz · ‎10-07-2010

Hi again,

Seems that removing the shared objects you suggested solved the problem for the executable.
The call from the python code still failed, but not on the segmentation violation - some symbols weren't found in runtime. Linking to mkl_mc.so solved it as well, so it seems all is working well right now (I still have to verify it on a few instances etc.)

Btw, how come it worked on some machines/instances and failed on others?

Thank you very much for the support.

Ita.

Dmitry_B_Intel · ‎10-07-2010

Hi again,

If you link libstaapp.so with MKL's .so libraries, you will likely get the issue anyway, depending on the use case.

If you don't link libstaapp.so with cpu-specific core libraries (libmkl_mc.so and such), the application linked with libstaapp.so still works on all platforms, because loader adds all necessary symbols from MKL libraries (-lmkl_intel_ilp6, -lmkl_sequential, and -lmkl_core) into application's global namespace and libmkl_core manages to pick appropriate function.

However, using this libstaapp.so from python fails because python loads libstaapp.so so that the symbols are not added to the global namespace. You partially solve the issue by additionally linking libstaapp.so to a cpu-specific library and thus fetch some dependent symbols on load, but this fails on the platform where that cpu-specific library is not supported. Adding more cpu-specific libraries confuses the loader, resulting in the strange SEGVs that you've observed.

There are currently two ways to solve the issue (see also discussion in the thread dlopen woes).

1. Instead of linking libstaapp.so to MKL .so libraries, you create your own mkl_custom.so, using tools/builder in MKL distribution.

2. (should work) You add initialization code into libstaapp.so, that brings MKL's .so symbols into global namespace, something like this:

[bash]#include 

#if defined(__cplusplus)
struct OnOpen
{
    void *dlh1, *dlh2;
    OnOpen() : dlh1(0), dlh2(0)
    {
        int flags = RTLD_GLOBAL|RTLD_LAZY;
        void *dlh1 = dlopen("libmkl_sequential.so",flags);
        void *dlh2 = dlopen("libmkl_core.so",flags);
    }
    ~OnOpen()
    {
        if (dlh1) dlclose(dlh1);
        if (dlh2) dlclose(dlh2);
    }
} ______onOpen;
#else
/* use __attribute__((constructor)) 
 * and  __attribute__((destructor))
 */
#endif[/bash]

Thanks
Dima

itakatz · ‎10-10-2010

Hi

You are completely right, it still fails in some use-cases when called from python, and after reading your explanation I start to understand why.

I will try the solutions you suggested and report.

Thank you again,

Ita.

itakatz · ‎10-18-2010

Hi again

after a week or so of tests it seems everything is working ok now. I used the 2nd method you suggested, i.e using an initialization code.

Thanks again for the effort.

Ita.

AndrewC · ‎10-21-2010

I had the same problem with a similar resolution

http://software.intel.com/en-us/forums/showthread.php?t=78131&p=1#132389

Except for the -z initfirst was needed in my case.