Announcing Intel MKL 2017 release

Gennady_F_Intel · ‎09-07-2016

Check out the new and the latest Intel® Math Kernel Library (Intel® MKL) 2017 release!

What's new in Intel MKL 2017:

Introduced optimizations for the Intel® Xeon Phi™ processor x200 (formerly Knights Landing ) self-boot platform for Windows* OS
Enabled Automatic Offload (AO) and Compiler Assisted Offload (CAO) modes for the second generation of Intel Xeon Phi coprocessor on Linux* OS
Introduced Deep Neural Networks (DNN) primitives including convolution, normalization, activation, and pooling functions intended to accelerate convolutional neural networks (CNNs) and deep neural networks on Intel® Architecture.
- Optimized for Intel® Xeon® processor E5-xxxx v3 (formerly Haswell), Intel Xeon processor E5-xxxx v4 (formerlty Broadwell), and Intel Xeon Phi processor x200 self-boot platform.
- Introduced inner product primitive to support fully connected layers.
- Introduced batch normalization, sum, split, and concat primitives to provide full support for GoogLeNet and ResidualNet topologies.
BLAS:
- Introduced new packed matrix multiplication interfaces (?gemm_alloc, ?gemm_pack ,?gemm_compute, ?gemm_free) for single and double precisions.
- Improved performance over standard S/DGEMM on Intel Xeon processor E5-xxxx v3 and later processors.
Sparse BLAS:
- Improved performance of parallel BSRMV functionality for processor supporting Intel® Advanced Vector Extensions 2 (Intel® AVX2) instruction set.
- Improved performance of sparse matrix functionality on the Intel Xeon Phi processor x200.
Intel MKL PARDISO:
- Improved performance of parallel solving step for matrices with fewer than 300000 elements.
- Added support for mkl_progress in Parallel Direct Sparse Solver for Clusters.
- Added fully distributed reordering step to Parallel Direct Sparse Solver for Clusters.
Fourier Transforms:
- Improved performance of batched 1D FFT with large batch size on processor supporting Intel® Advanced Vector Extensions (Intel® AVX), Intel AVX2, Intel® Advanced Vector Extensions 512 (Intel® AVX512) and IntelAVX512_MIC instruction sets
- Improved performance for small size batched 2D FFT on the Intel Xeon Phi processor x200 self-boot platform, Intel Xeon processor E5-xxxx v3, and Intel Xeon processor E5-xxxx v4.
- Improved performance for 3D FFT on the Intel Xeon Phi processor x200 self-boot platform.
LAPACK
- Included the latest LAPACK v3.6 enhancements. New features introduced are:
  - SVD by Jacobi ([CZ]GESVJ) and preconditioned Jacobi ([CZ]GEJSV)
  - SVD via EVD allowing computation of a subset of singular values and vectors (?GESVDX)
  - In BLAS level 3, generalized Schur (?GGES3), generalized EVD (?GGEV3), generalized SVD (?GGSVD3), and reduction to generalized upper Hessenberg form (?GGHD3)
  - Multiplication of a general matrix by a unitary or orthogonal matrix that possesses a 2x2 block structure ([DS]ORM22/[CZ]UNM22)
- Improved performance for large size QR(?GEQRF) on processors supporting theIntel AVX2 instruction set.
- Improved LU factorization, solve, and inverse (?GETR?) performance for very small sizes (<16).
- Improved General Eigensolver (?GEEV and ?GEEVD) performance for the case when eigenvectors are needed.
- Improved?GETRF, ?POTRF and ?GEQRF, linear solver (?GETRS) and SMP LINPACK performance on the Intel Xeon Phi processor x200 self-boot platform.
ScaLAPACK
- Improved performance for hybrid (MPI + OpenMP*) mode of ScaLAPACK and PBLAS.
- Improved performance of P?GEMM and P?TRSM resulted in better scalability of Qbox First-Principles Molecular Dynamics code.
Data Fitting:
- Introduced two new storage formats for interpolation results (DF_MATRIX_STORAGE_SITES_FUNCS_DERS and DF_MATRIX_STORAGE_SITES_DERS_FUNCS).
- Added Hyman monotonic cubic spline.
- Improved performance of Data Fititng functionality on the Intel Xeon Phi processor x200.
- Modified callback APIs to allow users to pass information about integration limits.
Vector Mathematics:
- Introduced optimizations for the Intel Xeon Phi processor x200.
- Improved performance for Intel Xeon processor E5-xxxx v3 and Intel Xeon processor E5-xxxx v4.
Vector Statistics:
- Introduced additional optimization of SkipAhead method for MT19937 and SFMT19937.
- Improved performance of Vector Statistic functionality including Random Number Generators and Summary Statistic on the Intel Xeon Phi processor x200.

Checkout Online Release notes for more information