Intel® Math Kernel Library 11.3 Update 4 is now available

Gennady_F_Intel · ‎05-02-2016

Intel® Math Kernel Library 11.3 Update 4 is now available

Intel® Math Kernel Library (Intel® MKL) is a highly optimized, extensively threaded, and thread-safe library of mathematical functions for engineering, scientific, and financial applications that require maximum performance.

Intel MKL 11.3 Update 3 packages are now ready for download. Intel MKL is available as part of the Intel® Parallel Studio XE and Intel® System Studio . Please visit the Intel® Math Kernel Library Product Page.

Intel® MKL 11.3 Update 4 Bug fixes

New Features in MKL 11.3 Update 4

BLAS:
- Introduced new packed matrix multiplication interfaces (?gemm_alloc, ?gemm_pack ,?gemm_compute, ?gemm_free) for single and double precisions.
- Improved performance over standard S/DGEMM on Intel Xeon processor E5-xxxx v3 and later processors.

LAPACK:
- Improved LU factorization, solve, and inverse (?GETR?) performance for very small sizes (<16).
- Improved General Eigensolver (?GEEV and ?GEEVD) performance for the case when eigenvectors are needed.
- Added TBB parallelism for ?ORGQR/?UNGQR.

Known Limitations:

cblas_?gemm_alloc is not supported on Windows* OS for the IA-32 architectures with single dynamic library linking.

New Features in MKL 11.3 Update 3

Improved Intel Optimized MP LINPACK Benchmark performance for Clusters on Intel® Advanced Vector Extensions 512 (Intel® AVX-512) and Second generation of Intel® Xeon Phi™ coprocessor
BLAS:
- Improved small matrix [S,D]GEMM performance on Intel® Advanced Vector Extensions 2 (Intel AVX2), Intel® Xeon® product family, Intel AVX-512 and on second generation of Intel® Xeon Phi™ coprocessor
- Improved threading (OpenMP) performance of xGEMMT, xHEMM, xHERK, xHER2K, xSYMM, xSYRK, xSYR2K on Intel AVX-512, and on second generation of Intel® Xeon Phi™ coprocessor
- Improved [C,Z]GEMV, [C,Z]TRMV, and [C,Z]TRSV performance on Intel AVX2, Intel AVX512, Intel® Xeon® product family,and on second generation of Intel® Xeon Phi™ coprocessor
- Fixed CBLAS_?GEMMT interfaces to correctly call underlying Fortran interface for row-major storage
LAPACK:
- Updated Intel MKL LAPACK functionality to latest Netlib version 3.6. New features introduced in this version are:
  - SVD by Jacobi ([CZ]GESVJ) and preconditioned Jacobi ([CZ]GEJSV) algorithms
  - SVD via EVD allowing computation of a subset of singular values and vectors (?GESVDX)
  - Level 3 BLAS versions of generalized Schur (?GGES3), generalized EVD (?GGEV3), generalized SVD (?GGSVD3) and reduction to generalized upper Hessenberg form (?GGHD3)
  - Multiplication of general matrix by a unitary/orthogonal matrix possessing 2x2 structure ( [DS]ORM22/[CZ]UNM22)
- Improved performance of LU (?GETRF) and QR(?GEQRF) on Intel AVX-512 and on second generation of Intel® Xeon Phi™ Coprocessor
- Improved check of parameters for correctness in all LAPACK routines to enhance security
SCALAPACK:
- Improved hybrid (MPI + OpenMP) performance of ScaLAPACK/PBLAS by increasing default block size returned by pilaenv
SparseBlas:
- Added examples that cover spmm and spmmd functionality
- Improved performance of parallel mkl_sparse_d_mv for general BSR matrices on Intel AVX2
Parallel Direct Sparse Solver for Clusters:
- Improved performance of solving step for small matrices (less than 10000 elements)
- Added mkl_progress support in Parallel Direct sparse solver for Clusters and fixed mkl_progress in Intel MKL PARDISO
Vector Mathematical Functions:
- Improved implementation of Thread Local Storage (TLS) allocation/de-allocation, which helps with thread safety for DLLs in Windows when they are custom-made from static libraries
- Improved the automatic threading algorithm leading to more even distribution of vectors across larger numbers of threads and improved the thread creation logic on Intel Xeon Phi, leading to improved performance on average

New Features in MKL 11.3 Update 2

Introduced mkl_finalize function to facilitate usage models when Intel MKL dynamic libraries or third party dynamic libraries are linked with Intel MKL statically are loaded and unloaded explicitly
Compiler offload mode now allows using Intel MKL dynamic libraries
Added Intel TBB threading for all BLAS level-1 functions
Intel MKL PARDISO:
- Added support for block compressed sparse row (BSR) matrix storage format
- Added optimization for matrixes with variable block structure
- Added support for mkl_progress in Parallel Direct Sparse Solver for Clusters
- Added cluster_sparse_solver_64 interface
Introduced sorting algorithm in Summary Statistics

What's New in Intel MKL 11.3:

Batch GEMM Functions
Introduced new 2-stage (inspector-executor) APIs for Level 2 and Level 3 sparse BLAS functions
Introduced MPI wrappers that allow users to build custom BLACS library for most MPI implementations
Cluster components (Cluster Sparse Solver, Cluster FFT, ScaLAPACK) are now available for OS X*
Extended the Intel MKL memory manager to improve scaling on large SMP systems

Check out the latest Release Notes for more updates

Yuriy_Shlepnev · ‎10-05-2016

We have tried new PARDISO updates 2017.0.109 and 11.3.4.246 and noticed substantial (over 2 times) slow down for the large matrices (3-4 million variables) in our em solver, comparing to the release version 2016.1.146. Is this known issue? Will it be resolved in the updates?

Yuriy Shlepnev

Gennady_F_Intel · ‎10-05-2016

Yuri,

no, this is an unknown issue. Is that version for SMP or distributed computations? Do you see degradation for all phases ( e.x 11,22 or 33)?

Could you give us the CPU specific?

wbr, Gennady