Intel® oneAPI Math Kernel Library
Ask questions and share information with other developers who use Intel® Math Kernel Library.

Intel MKL 10.3 update 4 is now available

Todd_R_Intel
Employee
940 Views

Intel Math Kernel Library (Intel MKL) is a highly optimized, extensively threaded, and thread-safe library of mathematical functions for engineering, scientific, and financial applications that require maximum performance. The Intel MKL 10.3 Update 4 packages are now ready for download. Intel MKL is available as a stand-alone product and as a part of the Intel Parallel Studio XE 2011, Intel C++ Studio XE 2011, Intel Composer XE 2011, Intel Fortran Composer XE 2011, Intel C++ Composer XE 2011, and Intel Cluster Studio 2011. Please visit the Intel Software Evaluation Center to evaluate this product.

Check out the latest Intel MKL benchmarks using the new Intel AVX instruction set!

The new Intel MKL 10.3 Update 4 release provides the following:

  • BLAS: Improved DTRMM performance on Intel Xeon processors 5400 and later
  • BLAS: Improved DTRSM performance on all 64-bit enabled processors, especially processors with Intel Advanced Vector Extensions (Intel AVX)
  • LAPACK: Incorporated bug fixes from the LAPACK 3.3.1 release
  • OOC PARDISO: Improved the estimate of the amount of memory needed in out-of-core operation
  • FFT: Improved 1D real FFT scaling through improved threading
  • FFT: Updated C and Fortran FFT examples to use the new single dynamic library linking model
  • VML: Improved performance of the single precision Enhanced Performance version of the real Hypot and complex Abs functions and of the complex Arg, Div, Mul, MulByConj functions for all accuracy modes on Intel Xeon processors 5600 and 7500 series, and the Intel Core i7-2600 processor
  • Service functions: Improvements and additions to the Intel MKL service functions (see the online release notes for more information)
  • Bug fixes

0 Kudos
8 Replies
Mistry__Mital
Beginner
940 Views
Hi

We have mkl version 10.0.1.014, can we switch to mkl 10.3 without any change to my system architecture and compiler?

Thnanks,

Mital Mistry
0 Kudos
Todd_R_Intel
Employee
940 Views

There have been a few changes in the directory structure (in 10.3) and as noted in the release notes you will need to use an update of 10.2 if you use Itanium.

So the answer depends on your architecture and which compiler you are using. In any case you will need to relink your application.

Regards,
Todd

0 Kudos
Mistry__Mital
Beginner
940 Views
Thanks Todd,

Our architecture is IA 64 and we have both intel and GNU compilers.

I was asked to update to MKL 10.3 by an intel person on other fourm because of some issues I am having with OOC.

Our system is SGI 8 processor with 32 GB of memory, and I guess we have Itanium.

What do you mean by update 10.2? is that for mkl or compiler?

Regards,
Mital


0 Kudos
Todd_R_Intel
Employee
940 Views

Intel Compilers (or now, Intel Composer XE 2011) include Intel MKL, but the version numbers don't correspond (take a look at this table). I was referring above to Intel MKL version numbers. Intel MKL 10.2 update 7 is the very latest version of Intel MKL that supports Itanium processorsand it can be found in Intel Compiler 11.1 update 8.

For the issue on OOC that you refer to, you will either need a workaround or a fixed version of 10.2.

Todd

0 Kudos
Mistry__Mital
Beginner
940 Views
We have intel compiler version 10.1.012. So, should I go for mkl 10.2 with this compiler or should go for entire new package that supports mkl 10.3

Mital
0 Kudos
Konstantin_A_Intel
940 Views
Hi Mital,
If your system is really IA-64, or Itanium-based (please send me the output of "uname -a" command) then you may use only 10.2 version of MKL (for example, the latest MKL 10.2.7). You can not use MKL 10.3 as far as it does not support IA-64 architecture.
About compatibility with compiler - no problems, please use icc/ifort 10.1 with MKL 10.2 easily. MKL is designed so that it does not have any dependencies to compiler version.
Regards,
Konstantin
0 Kudos
Mistry__Mital
Beginner
940 Views
Thanks Konstantin,

Here is the output of "uname -a" command;

"Linux keisgi 2.6.16.54-0.2.12-default #1 SMP Fri Oct 24 02:16:38 UTC 2008 ia64 ia64 ia64 GNU/Linux"

Also, with our current version of MKL (10.0.1.014), my code is running since last 10 hrs but it is at the following stage:
**************************************************************************************
ooc_max_core_size got by Env = 16000
The file ./pardiso_ooc.cfg was not opened

**************************************************************************************
The matrix is 150,000,000 x 150,000,000 with 60,000,000 non zeros. If Run it in-core, it says that LU has approximately 950,000,000 non-zeros and exit with -2 error. If I don't provide "MKL_PARDISO_OOC_MAX_CORE_SIZE=16000" and run it with out-core then it exit with -9 error.

Regards,
Mital
0 Kudos
Mistry__Mital
Beginner
940 Views
Konstantin,

I installed 10.2.7.041 and it is working perfectly with OOC. I am putting my case output so it might be helpful to other users. Thanks for your help.

******************************************************************************************
ooc_max_core_size got by Env = 30000
The file ./pardiso_ooc.cfg was not opened

=== PARDISO is running in Out-Of-Core mode, because iparam(60)=2 ===


================ PARDISO: solving a symmetric indef. system ==========
The local (internal) PARDISO version is : 1020001
PARDISO double precision computation is turned ON
METIS algorithm at reorder step is turned ON
Task fits RAM, OOC NLL factorization algorithm is turned ON
Scaling is turned ON


Summary PARDISO: ( reorder to reorder )
================

Times:
======
Time spent in calculations of symmetric matrix portrait(fulladj): 1.21927
Time spent in reordering of the initial matrix(reorder) : 24.2554
Time spent in symbolic factorization(symbfct) : 22.7608
Time spent in data preparations for factorization(parlist) : 3.19630
Time spent in in allocation of internal data structures(malloc) : 1.17290
Time spent in additional calculations : 17.7799
Total time spent : 70.3847

Statistics:
===========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 1460346
#non-zeros in A: 58022798
non-zeros in A (%): 0.002721

#right-hand sides: 101

< Factors L and U >
#columns for each panel: 192
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 135919
size of largest supernode: 20746
number of nonzeros in L 3449087253
number of nonzeros in U 1
number of nonzeros in L+U 3449087254
Reordering completed ...
Number of nonzeros in factors = -845880042
Number of factorization MFLOPS = 32965576
Percentage of computed non-zeros for LL^T factorization
0 % 1 % 2 % 3 % 4 % 5 % 6 % 7 % 8 % 9 % 10 % 11 % 12 % 13 % 20 % 21 % 22 % 23 % 24 % 25 % 26 % 27 % 28 % 29 % 30 % 31 % 38 % 39 % 40 % 41 % 42 % 43 % 44 % 45 % 46 % 47 % 48 % 49 % 56 % 57 % 58 % 59 % 60 % 61 % 62 % 63 % 64 % 65 % 66 % 67 % 64 % 75 % 76 % 77 % 78 % 79 % 80 % 81 % 82 % 83 % 84 % 85 % 86 % 93 % 94 % 95 % 96 % 97 % 98 % 99 % 100 %

================ PARDISO: solving a symmetric indef. system ================


Summary PARDISO: ( factorize to solve )
================

Times:
======
Time spent in copying matrix to internal data structure(A to LU): 0.000001 s
Factorization: Time for writing to files : 0.000000
Factorization: Time for reading from files : 0.000000
Time spent in factorization step(numfct) : -1980.722245 s
Solution: Time for reading from files : 0.000000
Time spent in direct solver at solve step (solve) : 766.926043 s
Time spent in in allocation of internal data structures(malloc) : 17663.688013 s
Time spent in additional calculations : 4016.110042 s
Total time spent : 20466.001854 s
==============================================================
----------- Out of core time (in percent (%)) --------------
Factorization step (100 (%)):
write to files : 0
read from files: 0
factorization - write&read : 100
Solution step (100 (%)):
read from files: 0
solve - write&read: 100
Total time (100 (%)):
read from files: 0
total - write&read: 100
----------- Out of core Mb --------------
Factorization step:
write to files : 0.000 Mb
read from files: 0.000 Mb
Solution step:
read from files: 0.000 Mb
Total size of data transferred :
write&read : 0.000 Mb
==============================================================

Statistics:
===========
< Parallel Direct Factorization with #processors: > 8
< Numerical Factorization with BLAS3 and O(n) synchronization >

< Linear system Ax = b>
#equations: 1460346
#non-zeros in A: 58022798
non-zeros in A (%): 0.002721

#right-hand sides: 101

< Factors L and U >
#columns for each panel: 192
#independent subgraphs: 0
< Preprocessing with state of the art partitioning metis>
#supernodes: 135919
size of largest supernode: 20746
number of nonzeros in L 3449087253
number of nonzeros in U 1
number of nonzeros in L+U 3449087254
gflop for the numerical factorization: 32965.576421

Solve completed ...

**********************************************************************************************
Thank you very much.

Regards,

Mital
0 Kudos
Reply