Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Parrellel computing with MKL

junuylia
Beginner
452 Views
Hi there,

I've had a complete code for doing a statistical sampling. My sampling scheme is based on object oriented programming. My question is that is there any way I could use multiple threads in MKL based on my current code without making much changes? My code is run on Linux, the intel fortran version is 11.1. Thank you very much.

June
0 Kudos
7 Replies
Chao_Y_Intel
Moderator
452 Views

Hi June,

You can provide some details, so more users can provide help, for example, which kind of functions are using in the application ? Intel MKL provides some summary statistics function, you can check if these functions can replace your code.

http://software.intel.com/en-us/articles/overview-of-summary-statistics-ss-in-intel-mkl-v103/

Also if you use some common interfaces, like BLAS, LAPACK, you does not need to change the code, you can relink your code with Intel MKL.

Thanks,
Chao

0 Kudos
junuylia
Beginner
452 Views
Thank you.

I've already used the MKL library in my code, I used both BLAS and LAPACK functions to finish the matrix computation, eigenvalues and inversion. I also used MKL to sample random variables, and VSL to do simple statistics.

Do you mean I could just change the compiling option to use multiple threads? My current linking command is divided by three parts,

[bash]1. compile my own functions

ifort -g -c link.f myfun.f90  -I$SSL_INC -L$SSLLIB -I$MKL_INC -L$MKLLIB  -lmkl_blas95_lp64 -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5

2. put my functions in my own library
ar rcvf libmy_mod.a link.o myfun.o


3. compile the main program

ifort reg.f90 -L.  -L$SSLLIB  -L$MKLLIB -lmy_mod -Wl,--start-group $SSLLIB/libss_interface.a $SSLLIB/libss_kernel.a -Wl,--end-group  -lmkl_blas95_lp64 -lmkl_lapack95_lp64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -o reg.o[/bash]



Thank you very much.

0 Kudos
Andrey_N_Intel
Employee
452 Views
Hello June,

According to yourbuild line, you use the parallel version of MKLwhich should help you to improve perfromance of your application on multi-core CPU. Parallelization of MKL routines is done by the library for you. On the other hand,MKL provides the set of threading control functions whichallows you to have finer control on the number of threads used in MKL. For example, you can specify number of threads to be used in BLAS domain of MKL in the following way:

ierr = mkl_domain_set_num_threads( num, MKL_BLAS)

The settings done by means of those functions wouldtake precedence over OpenMP* settings.
The additional information about threading control functions can be found in MKL Manual, Chapter "Support Functions".

Inoticed that you use the standalone (whatif) version of Summary Stats package in your application. Summary Statistics functions were integrated into MKL 10.3. I suggest you to use the Summary Statistics functions as part ofMKL for several reasons:
1. Standalone version of Intel Summary Statistics Library is not supported
2. As part of MKL, those functions follow all conventions and specs of Math Kernel Library
3.The additionalperformance tuningsand bug fixes in Summary Stats functions are availablein MKL (this includes the support of the latest CPUs)
4. Your build/link line would be simplier as you need to link against MKL libs only

Please, let us know if this addresses your questions.
Please, feel free to ask more questions about threading and functionality in MKL if any.

Andrey
0 Kudos
junuylia
Beginner
452 Views
Thank you Andrey, that explains a lot. I'll try this.

For the SSL, I'm using the intel Fortran 11.1 on a Linux server, the MKL is Intel Math Kernel Library 10.2 Update 2, and I couldn't find the Summary Statistics functions from the manual, that's why I used a standalone version. Could you give me some hint for that?

June
Summary Statistics functions
0 Kudos
Andrey_N_Intel
Employee
452 Views
Hello June,

VSL Summary Statistics feature is available in MKL 10.3; for this reason you do not find those functions in MKL 10.2.x.
You might want to evaluateSummary Stats inUpdate 3 of MKL 10.3. The additional information about the algorithms and interfaces of Summary Statistics is available in Intel MKL Manual and in the Application Notes at

http://software.intel.com/sites/products/documentation/hpc/mkl/ssl/sslnotes.pdf

Besidesthis new statisticalfunctionality you would be able to find more features and optimizations acrossMKL including BLAS, LAPACK and VSL Random Number Generators. Additionaldetails about the features of MKL10.3.x are available at

http://software.intel.com/en-us/articles/intel-mkl-103-release-notes/

Please, let me know if you have more questions.

Thanks,
Andrey
0 Kudos
junuylia
Beginner
452 Views
Some follow-up questions.

I used the function MKL_SET_NUM_THREADS(16), where 16 in the number of my processors, the performance didn't get better. Since I assume MKL has already done the parallel computation without this function, I could live with it.

But when I compiled the file by adding these options, -O3 -parallel -par-report1 , the performance get much slower, about 5 times slower. What happened?

Also, I learned that the statement 'do concurrent' would do the parallel, but it looks like the MKL on my server doesn't have it. So what is the oldest version on Linux that having this feather? Thank you very much.

June
0 Kudos
TimP
Honored Contributor III
452 Views
You must have found some of the documentation, if you discovered mkl_set_num_threads(). We can't comment about your 16 processors without some details; for example, if you mean 8 cores with hyperthreading, there ia documentation explaining why MKL gets best performance with 8 threads, and won't use 16 unless you set MKL_DYNAMIC.
-parallel shouldn't affect MKL directly. Your report should give you hints about where it attempted parallelism in your source code. It would be unusual for autoparallel to slow down your application, unless you also used -par-threshold. In the unlikely event that -parallel parallelizes a loop which calls MKL, the mkl_sequential might perform better.
Likewise, Fortran do concurrent would have no direct effect on MKL. It could be used to make a parallel construct in your own Fortran. It was introduced in ifort 12.0 xe 2011.
0 Kudos
Reply