Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

BLAS 1 Multithread

sebydocky
Beginner
767 Views
Hello,

It's possible to have BLAS 1 multithreaded functions (such ddot) with MKL ?. I tried with the MKL 10.2 in order to multithread "liblinear" but without success.... Do I need /Qopenmp flag ? (I am working on win32 & win64 plateforms with C2D or I7mobile)...

If someone have a small example ...

Regards,

Sbastien
0 Kudos
11 Replies
Gennady_F_Intel
Moderator
767 Views
Sbastien,
- yes, this routine is threaded. Please see the info about that into User's Guide, chapter 6.
- yes, the examples of this routines are available: you can find these examples in \blas\source\ ddotx.f
or

\examples\cblas\source\ cblas_ddotx.c

- you don't need to use /Qopenmp flag to compile. Please link threading libraries. Please look here to find out the recommended libraries.

--Gennady

0 Kudos
sebydocky
Beginner
767 Views
Hello,

In the cblas_ddotx.c, ddot call is done by cblas_ddot. In liblinear, it's using directly ddot function. I think there should be any difference for multithreading ?.

I link versus mkl_core.lib, mkl_intel_c.lib and mkl_intel_thread.lib

Sbastien
0 Kudos
sebydocky
Beginner
767 Views

Well .... as indicated I have to add /Qopenmp flag and on my C2D, now, the 2 core are used.. However the difference between sequential or multithreaded are small ...

With multi-theadings

mex -D_DENSE_REP -DBLAS -f mexopts_intel10.bat -output train_dense.dll train_dense.c linear_model_matlab.c linear.cpp tron.cpp "C:\Program Files\Intel\Compiler\11.1\065\mkl\ia32\lib\mkl_core.lib" "C:\Program Files\Intel\Compiler\11.1\065\mkl\ia32\lib\mkl_intel_c.lib" "C:\Program Files\Intel\Compiler\11.1\065\mkl\ia32\lib\mkl_intel_thread.lib"

tic, model{t} = train_dense(ytopic' , X , options , 'col');,toc

Elapsed time is 5.696018 seconds.


and without ...

mex -D_DENSE_REP -DBLAS -f mexopts_intel10.bat -output train_dense.dll train_dense.c linear_model_matlab.c linear.cpp tron.cpp "C:\Program Files\Intel\Compiler\11.1\065\mkl\ia32\lib\mkl_core.lib" "C:\Program Files\Intel\Compiler\11.1\065\mkl\ia32\lib\mkl_intel_c.lib" "C:\Program Files\Intel\Compiler\11.1\065\mkl\ia32\lib\mkl_sequential.lib"

tic, model{t} = train_dense(ytopic' , X , options , 'col');,toc

Elapsed time is 5.643861 seconds.


where X is a (10240 x 3588) double precision matrix.

Regards,

Sbastien

0 Kudos
Gennady_F_Intel
Moderator
767 Views
quote:"I think there should be any difference for multithreading ?"
there is no difference between ddot and cblas_ddot from multithreading point of view.
these are different API only for the same functionality.
--Gennady
0 Kudos
Gennady_F_Intel
Moderator
767 Views
Sbastien,
I don't see the threading library ( libiomp5md.lib)into the linking line you are using.
I don't understand, you mentioned thatX is a (10240 x 3588) double precision matrix.
What is the real number of elements of vectors in your experiments?
Could you provide more details about CPU you are working?
--Gennady
0 Kudos
sebydocky
Beginner
767 Views
Gennady,

Yes I don't have any error without libiomp5md.lib during linking. It's important ?

liblinear use more particulary BLAS1 function with vector's dimension equal to 10240 in my given example.


First test system
C2D T7500, XP SP3 32, Intel compiler 10.1.13

Second system

I7m 720, W7 64, Intel compiler 11.165.

For this later, multithreaded is not working ... even by setting MKL_NUM_THREADS & OMP_NUM_THREADS variables.
0 Kudos
Zhanghong_T_
Novice
767 Views
Dear Gennady,

Your link to guide the set up link libraries is very useful. However, I am confused about the item "select cluster library". What's the difference between BLACS and ScaLAPACK? To let the BLAS functions have best speed up under multiple CPUs and multiple cores, which functions should I select?

The environment of my program is:
Win7 x64 + VS2008 + IVF 11.1.065 + Intel MPI

Thanks,
Zhanghong Tang
0 Kudos
eliosh
Beginner
767 Views
I also tried to enable multithreading in Matlab's MEX files however, it does not seem to work.

In fact those MEX files are regular shared libraries (.DLL or .SO) with a different name.
Hence, I am not sure that usual ways of enabling multiple threads will work is a shared library. I suppose it is related to the threads of the main program or something like this.

Can anyone clarify the situation or give a link to appropriate readings.


Thank you.
0 Kudos
Gennady_F_Intel
Moderator
767 Views
Sbastien,
I don't understand which MKL version you are using?
Fact is that, we are releasing MKL either standalone or bundled with Intel Fortran Compiler versions.
- if you are using standalone version, then look at the mklsupport.txt file (/doc/mklsupport.txt) and you can see smth like:Package ID: w_mkl_p_10.2.5.035.
- with the bundled with IVF version - please see this KB where you can find which version of MKL is bundled with this version of compiler.
--Gennady
0 Kudos
Gennady_F_Intel
Moderator
767 Views
HiZhanghong Tang,
I think it would be better if you, first of all, will read the MKL Reference Manual regarding BLACS routine description. If you will have further questions,then we will try to help.
>> To let the BLAS functions have best speed up under multiple CPUs and multiple cores, which functions should I select?
The biggest part of BLAS 1,2 and 3 levels routines ( density storage scheme ) are threaded and show very good scalablity results.
Which MKL's routines are you going to use?
--Gennady
0 Kudos
Gennady_F_Intel
Moderator
767 Views

it may depends on such factors like:
MKL version which is used in your version of Matlab? ask Mathworks about that.
functionality..
the input size..
0 Kudos
Reply