- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I'm trying to use Intel MKL library in an Octave MEX function, but the performance that I achieve using some MKL functions such as cblas_cgemm is 5 time slower when called from Octave rather than a compiled C executable. I'm using the same compilation flags for both C code and MEX functions in my testing, where I basically compare the speed of a very simple C matrix multiplication script and the same script wrapped in a MEX function (find this short example attached).
This is how I compile the C code:
gcc -I/opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/include -Wall -L/opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64 -o cgemm_test_c matmult_c.c -lmkl_gnu_thread -lmkl_rt -lmkl_core -lmkl_intel_ilp64 -lgomp -lpthread -lm -ldl
This is how I compile the MEX function:
mex -v -I/opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/include -Wall -L/opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64 -o cgemm_test_mex matmult_c.c matmult_mex.c -lmkl_gnu_thread -lmkl_rt -lmkl_core -lmkl_intel_ilp64 -lgomp -lpthread -lm -ldl
And this is what the mex command is really doing:
gcc -c -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/octave-4.2.2/octave/.. -I/usr/include/octave-4.2.2/octave -I/usr/include/hdf5/serial -pthread -fopenmp -g -O2 -fdebug-prefix-map=/build/octave-DtqyIg/octave-4.2.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -I. -I/opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/include -DMEX_DEBUG matmult_c.c -o matmult_c.o gcc -c -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/octave-4.2.2/octave/.. -I/usr/include/octave-4.2.2/octave -I/usr/include/hdf5/serial -pthread -fopenmp -g -O2 -fdebug-prefix-map=/build/octave-DtqyIg/octave-4.2.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wall -I. -I/opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/include -DMEX_DEBUG matmult_mex.c -o matmult_mex.o g++ -I/usr/include/octave-4.2.2/octave/.. -I/usr/include/octave-4.2.2/octave -I/usr/include/hdf5/serial -I/usr/include/mpi -pthread -fopenmp -g -O2 -fdebug-prefix-map=/build/octave-DtqyIg/octave-4.2.2=. -fstack-protector-strong -Wformat -Werror=format-security -shared -Wl,-Bsymbolic -Wall -o cgemm_test_mex.mex matmult_c.o matmult_mex.o -L/opt/intel/compilers_and_libraries_2019.0.117/linux/mkl/lib/intel64 -lmkl_gnu_thread -lmkl_rt -lmkl_core -lmkl_intel_ilp64 -lgomp -lpthread -lm -ldl -L/usr/lib/x86_64-linux-gnu/octave/4.2.2 -L/usr/lib/x86_64-linux-gnu -loctinterp -loctave -Wl,-Bsymbolic-functions -Wl,-z,relro
Test results:
C code: Elapsed time per multiplication: ~1.86 ms
MEX code: Elapsed time per multiplication: ~8.55ms
I have tested different optimisation flags but the results are virtually the same thing. This has been tested in 2 Intel machines with Ubuntu 18.04 and Ubuntu 14.04, yielding very similar results in all cases. MKL environment variables are set as per "source /opt/intel/mkl/bin/mklvars.sh intel64"
Many thanks in advance,
Juan.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
that's interesting .... could you check if the performance gap will be the same in the case of square matrixes? m=n=k == for example 8000.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
and which version of Octave do you use?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page