Intel® Math Kernel Library 2017 Update 4 is now available

Gennady_F_Intel · ‎09-30-2017

Intel® Math Kernel Library (Intel® MKL) is a highly optimized, extensively threaded, and thread-safe library of mathematical functions for engineering, scientific, and financial applications that require maximum performance.

Intel MKL 2017 Update 2 packages are now ready for download. Intel MKL is available as part of the Intel® Parallel Studio XE and Intel® System Studio . Please visit the Intel® Math Kernel Library Product Page.

What's New in Intel MKL 2017 Update 4

BLAS:
- Addressed an early release buffer issue in *GEMV threaded routines
- Improved Intel® Threading Building Blocks *GEMM performance for small m, n and large k cases
- Fixed irregular division by zero and invalid floating point exceptions in {C/Z}TRSM for Intel® Xeon Phi™ processor x200 (aka KNL) and Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) code path
- Improved {s/d} GEMV threaded performance on Intel64 architecture
- Addressed incorrect SSYRK calculation on Intel® Xeon Phi™ processor x200 with Intel® TBB threading occurring if the value of K is very large
- Addressed a GEMM multithreading issue, which may cause segfaults for large matrices (M, N >= ~30,000, K >= ~5000) on for Intel® Xeon Phi™ processor x200 (aka KNL)
Deep Neural Networks:
- Added support for non-square pooling kernels
Sparse BLAS:
- Improved SpMV and SpMM performance for the processor supporting Intel® AVX512 Instruction set Improved SpMV performance for the processor supporting Intel® AVX2 Instruction set
- Added Intel® TBB support for SparseSyrk and SpMM routines
Intel MKL Pardiso:
- Significantly improved factorization and solving steps for “small” matrices
- Introduced low rank approach suitable for solving set of systems with small changes in elements
Parallel Direct Sparse Solver for Cluster:
- Added Iterative support
- Improved performance for number of processes not power of 2
LAPACK:
- Improved performance of ?(OR|UN)GQR, ?GEQR and ?GEMQR routines in Intel(R) TBB threading layer.
- Introduced LAPACKE_set_nancheck routine for disabling/enabling nan checks in LAPACKE functions.
FFT:
- Improved 2D and 3D FFT performance for the processors supporting Intel® AVX512 and Intel® AVX2 Instruction sets.
- Improved FFT performance of w/ and w/o scaling factor across all domains.
- Introduced MKL_VERBOSE mode support for FFT domain.

What's New in Intel MKL 2017 Update 3

BLAS:
- Optimized SGEMM for Intel® Xeon Phi™ processor x*** (codename Knights Mill)
- Improved performance for ?GEMM for medium problem sizes on Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server)
- Improved performance for SGEMM/DGEMM for small problem sizes on Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server)
- Improved performance for ?GEMM_BATCH on all architectures
- Improved performance for SSYMV/DSYMV on Intel® Advanced Vector Extensions 2 (Intel® AVX2) and later architectures
- Improved performance for DGEMM Automatic Offload (AO) for square sizes (3000<M=N=K< 10000) on Intel® Xeon Phi™ processor 72** (formerly Knights Landing)
- Improved performance for general BLAS functions on the 32-bit Intel® Advanced Vector Extensions 512 (Intel® AVX512) architecture
- Fixed ?AXPBY to propagate NaNs in the y vector when beta = 0 on 64-bit Intel® Advanced Vector Extensions 2 (Intel® AVX2) and later architectures
FFT:
- Improved performance of 3D FFT complex-to-real and real-to-complex problems on Intel® Xeon Phi™ processor 72** (formerly Knights Landing)
- Improved performance of 2D FFT complex-to-complex problems with scale on Intel® Xeon Phi™ processor 72** (formerly Knights Landing)
High Performance Conjugate Gradients (HPCG):
- Add support of Intel® Xeon® Processor supporting Intel® Advanced Vector Extensions 512 (Intel® AVX-512) (codename Skylake Server)
Deep Neural Networks:
- Added initial convolution and inner product optimizations for the next generation of Intel Xeon Phi processor (code name Knights Mill)
- Improved parallel performance of convolutions on Intel Xeon Phi processor (code name Knights Landing)
- Average pooling has an option to include padding into mean values computation
LAPACK:
- Optimized ?GELQ and ?GEMLQ performance for short-and-wide matrices
- Optimized performance of ?ORCSD2BY1 and ? DORCSD routines
- Fixed LU performance degradation for medium sizes on 6 threads
Vector Statistics:
- Fixed failure of VSL RNG MT19937 on big vector lengths on Intel® Xeon Phi™ Coprocessor x100 Product Family.
- Improved performance of Outlier Detection (BACON) algorithm for single and double precisions for processor supporting Intel® AVX2 and intel® AVX512 Instruction sets

What's New in Intel MKL 2017 Update 2

Library Engineering:
- Intel® AVX-512 code is dispatched by default on Intel® Xeon processors
BLAS:
- Improved performance of dgemv non transpose when number of threads are large (typically on Intel® Xeon Phi™ processor x200 (formerly Knights Landing)). For example: factor 2 speedup when M=K=10000 with 68 threads on Intel® Xeon Phi™ processor x200
- Improved performance for dgemm, TN and NN cases, with very small N on Intel® Xeon Phi™ processor x200 and 6th Generation Intel® Core™ processor ( as known as Skylake)
- Introduced MKL_NUM_STRIPES environment variable and accompanying Intel MKL support functions to control the 2D partitioning of multithreaded *GEMM on all Intel architectures except from Intel® Xeon Phi™ Coprocessor x100 Product Family. Please see the related section in Intel MKL Developer Guide for details.
- Improved the {s,d}gemm_compute performance on Intel64 architectures supporting Intel® AVX2 instruction set.
- Improved ?gemm_batch performance when N==1.
Sparse BLAS
- Improved performance of BCSMV functionality with 3-10, 14 and 18 problem sizes for processor supporting Intel® AVX2 and intel® AVX512 Instruction sets
- Improved performance of CSRMV functionality for processor supporting Intel® AVX2 and intel® AVX512 Instruction sets
- Added Intel® Threading Building Blocks (Intel® TBB) threading support for CSRMV functionality with symmetric matrices
Intel MKL Pardiso
- Added support of Intel TBB threading support for Intel MKL Pardiso at the solving step
Deep Neural Networks:
- Improved performance on Intel Xeon processors with Intel® AVX2 and Intel® AVX512 instruction set support
- Improved performance on the second generation of Intel® Xeon Phi™ processor x200
- Introduced support for rectangular convolution kernels
- Significantly improved reference convolution code performance
- Added unsymmetric padding support in convolution and pooling
- Introduced extended Batch Normalization API that allows access to mean, variance, scale and shift parameters
LAPACK:
- Added ?GEQR, ?GEMQR and ?GETSLS functions with performance optimized for tall-and-skinny matrices.
- Improved LAPACK performance for very small sizes (N<16) in LP64 layer by reducing internal LP64/ILP64 conversion overhead.
- Improved ?[SY|HE]EVD scalability up to 32 and beyond threads on Intel® Xeon and Intel® Xeon Phi™ processor x200
- Significantly improved ?LANGE (‘Frobenius’ norm) performance
ScaLAPACK:
- Added MKL_RPOGRESS() support in P?GETRF
- Improved P?TRSM/P?SYRK performance
- Optimized ?GE(SD|RV|BS|BR)2D routines in BLACS
- Fixed failure in P?GEMM (‘N’, ‘N’ case)
Vector Mathematics:
- Added Intel TBB threading support for all mathematical functions.
Vector Statistics:
- Improved C interfaces of vsl*SSEdit*() functions

Known Limitations :

For Intel® Xeon Phi™ processor x200 leverage boot mode without Hyper Threading, MKL have an oversubscription of threads for versions prior to MPSS 4.3.2 due to COI occupying 4 cores. This affects the performance of MKL substantially. As an work around, the costumer can add ‘norespect’ to the MIC_KMP_AFFINITY environment variable.
?GETRF functionality can give incorrect results for some matrices of 5x5 size when MKL_DIRECT_CALL is enabled. The patch fixing the issue is posted on MKL Forum.
Recently added TS QR functionality (?GEQR and ?GEMQR) may demonstrate very slow performance when the number of threads is less than 30.

On SKX DGEMM does not scale C by beta when transa == N, transb == N, K==0 and N==2. A workaround is to set transa == T or transb == T since with K==0 the transpose is not relevant

New Features in MKL 2017 Update 1:

Added support of Intel® Xeon Phi™ processor x200 leverage boot mode on Windows* OS.
BLAS :
- The Intel Optimized MP LINPACK Benchmark supports various MPI implementations in addition to Intel MPI, and the contents of the mp_linpack directory have changed.
- Improved single thread SGEMM/DGEMM performance on Intel® Advanced Vector Extensions 2 (Intel® AVX2), Intel® Advanced Vector Extensions 512 (Intel® AVX-512), and Intel® Xeon® for Intel® Many Integrated Core Architecture.
Deep Neural Networks (DNN) primitives :
- Introduced additional optimizations for Intel® Xeon® processor E3-xxxx V5 ( formerly Skylake).
- Added support of non-square cores of convolution
Sparse BLAS :
- Improved Sparse BLAS matrix vector functionality in block compressed sparse row (BSR) format for block size equal to 6,10,14, or 18 on Intel AVX2.
- Improved Inspector-executor Sparse BLAS matrix-vector and matrix-matrix functionality for symmetric matrices.
LAPACK :
- Improved performance of ?GETRF, ?GETRS and ?GETRI for very small matrices via MKL_DIRECT_CALL.
- Improved performance of ?ORGQR and SVD functionality for tall-and-skinny matrices.
- Parallelized ?ORGQR in Intel® Threading Building Blocks (Intel® TBB) threading layer.
Vector Math :
- Introduced the exponential integral function E1 with three accuracy levels HA, LA, and EP, for single precision and double precision real data types.
ScaLAPACK :
- Improved performance of PZGETRF.

What's new in Intel MKL 2017:

Introduced optimizations for the Intel® Xeon Phi™ processor x200 (formerly Knights Landing ) self-boot platform for Windows* OS
Enabled Automatic Offload (AO) and Compiler Assisted Offload (CAO) modes for the second generation of Intel Xeon Phi coprocessor on Linux* OS
Introduced Deep Neural Networks (DNN) primitives including convolution, normalization, activation, and pooling functions intended to accelerate convolutional neural networks (CNNs) and deep neural networks on Intel® Architecture.
- Optimized for Intel® Xeon® processor E5-xxxx v3 (formerly Haswell), Intel Xeon processor E5-xxxx v4 (formerlty Broadwell), and Intel Xeon Phi processor x200 self-boot platform.
- Introduced inner product primitive to support fully connected layers.
- Introduced batch normalization, sum, split, and concat primitives to provide full support for GoogLeNet and ResidualNet topologies.
BLAS:
- Introduced new packed matrix multiplication interfaces (?gemm_alloc, ?gemm_pack ,?gemm_compute, ?gemm_free) for single and double precisions.
- Improved performance over standard S/DGEMM on Intel Xeon processor E5-xxxx v3 and later processors.
Sparse BLAS:
- Improved performance of parallel BSRMV functionality for processor supporting Intel® Advanced Vector Extensions 2 (Intel® AVX2) instruction set.
- Improved performance of sparse matrix functionality on the Intel Xeon Phi processor x200.
Intel MKL PARDISO:
- Improved performance of parallel solving step for matrices with fewer than 300000 elements.
- Added support for mkl_progress in Parallel Direct Sparse Solver for Clusters.
- Added fully distributed reordering step to Parallel Direct Sparse Solver for Clusters.
Fourier Transforms:
- Improved performance of batched 1D FFT with large batch size on processor supporting Intel® Advanced Vector Extensions (Intel® AVX), Intel AVX2, Intel® Advanced Vector Extensions 512 (Intel® AVX512) and IntelAVX512_MIC instruction sets
- Improved performance for small size batched 2D FFT on the Intel Xeon Phi processor x200 self-boot platform, Intel Xeon processor E5-xxxx v3, and Intel Xeon processor E5-xxxx v4.
- Improved performance for 3D FFT on the Intel Xeon Phi processor x200 self-boot platform.
LAPACK:
- Included the latest LAPACK v3.6 enhancements. New features introduced are:
  - SVD by Jacobi ([CZ]GESVJ) and preconditioned Jacobi ([CZ]GEJSV)
  - SVD via EVD allowing computation of a subset of singular values and vectors (?GESVDX)
  - In BLAS level 3, generalized Schur (?GGES3), generalized EVD (?GGEV3), generalized SVD (?GGSVD3), and reduction to generalized upper Hessenberg form (?GGHD3)
  - Multiplication of a general matrix by a unitary or orthogonal matrix that possesses a 2x2 block structure ([DS]ORM22/[CZ]UNM22)
- Improved performance for large size QR(?GEQRF) on processors supporting theIntel AVX2 instruction set.
- Improved LU factorization, solve, and inverse (?GETR?) performance for very small sizes (<16).
- Improved General Eigensolver (?GEEV and ?GEEVD) performance for the case when eigenvectors are needed.
- Improved?GETRF, ?POTRF and ?GEQRF, linear solver (?GETRS) and SMP LINPACK performance on the Intel Xeon Phi processor x200 self-boot platform.
ScaLAPACK :
- Improved performance for hybrid (MPI + OpenMP*) mode of ScaLAPACK and PBLAS.
- Improved performance of P?GEMM and P?TRSM resulted in better scalability of Qbox First-Principles Molecular Dynamics code.
Data Fitting :
- Introduced two new storage formats for interpolation results (DF_MATRIX_STORAGE_SITES_FUNCS_DERS and DF_MATRIX_STORAGE_SITES_DERS_FUNCS).
- Added Hyman monotonic cubic spline.
- Improved performance of Data Fititng functionality on the Intel Xeon Phi processor x200.
- Modified callback APIs to allow users to pass information about integration limits.
Vector Mathematics:
- Introduced optimizations for the Intel Xeon Phi processor x200.
- Improved performance for Intel Xeon processor E5-xxxx v3 and Intel Xeon processor E5-xxxx v4.
Vector Statistics:
- Introduced additional optimization of SkipAhead method for MT19937 and SFMT19937.
- Improved performance of Vector Statistic functionality including Random Number Generators and Summary Statistic on the Intel Xeon Phi processor x200.

Checkout Online Release notes for more information

Artem_S_ · ‎11-09-2016

Dear developers,

I have Mac with Mac OS 10.12 (Sierra) and yesterday I downloaded and installed Inter Parallel studio 2017.1.040 of Fortran and C++ with MKL libraries.

But, when I checked linking of libraries (otool -L libname) I got follows result:

/opt/intel/mkl/lib/libmkl_intel_lp64.dylib:

@rpath/libmkl_intel_lp64.dylib (compatibility version 0.0.0, current version 0.0.0)

/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 169.3.0)

Is it OK that library has link on itself?
Cause previously I used some of 2016 version and my binaries didn't have problems with linkage. But now, when I'm trying to run my binary through shell script or python script I immediately receive error like this:

dyld: Library not loaded: @rpath/libmkl_intel_lp64.dylib

If I'm running this binary in terminal it works correct without any errors and problem with links.

Thanks in advance

Evarist_F_Intel · ‎11-09-2016

Hi Artem,

It looks like the problem caused by SIP.
For more information on these, please refer to https://software.intel.com/en-us/articles/os-x-1011-support-in-intel-parallel-studio-xe-2016, section Dynamic Library Dependencies.
Be brief, you need to add rpath to you executable / DSO during the linkage with MKL.

Theo-at-Stillwater · ‎03-07-2017

Where does this update 2 install the libraries? I went through the installation, but I can't find the MKL libraries anywhere, nor did the installer indicate where it was installing stuff. I just need the MKL libraries for a CGO interface, I don't have or need Parallel Studio. The docs state that they get installed under these products, but since I don't have them, where does the installer put them?

Gennady_F_Intel · ‎03-07-2017

The default installation directory for a standalone installation of Intel MKL is: Linux* OS: /opt/intel/parallel_studio_xe_2017.u.xyz/compilers_and_libraries_2017/linux/mkl/ (where u – update number, 0 for gold releases, xyz is package number).

In the text that follows, <arch> refers to the primary processor architecture, such as ia32 or intel64 and <MKLROOT> refers to the Intel MKL installation directory. Additionally, substitute a '/' below for the '\' if your system is a Linux* OS or macOS* system.

the similar wrt Windows OS*.

Renshaw__Daniel · ‎03-10-2017

Have the licensing terms for MKL changed in this version? The license.txt found in the latest version's installation package is quite different to the previous version. The new version appears to not require a named user license? The various Intel web pages (all apparently current) still suggest a named user license is required.

Mikhail_K_ · ‎03-18-2017

Support for Visual Studio 2017 comming soon?

Gennady_F_Intel · ‎03-18-2017

very probably it will happen the next version.

yan_c_ · ‎04-27-2017

Hi, I want to know whether is a bug or not ？？？ Thanks !

concrete information below url(URL ok)：

https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/733207

Gennady_F_Intel · ‎04-27-2017

I couldn't reach this url

yan_c_ · ‎04-27-2017

Gennady F. (Intel) wrote:

I couldn't reach this url

Oops！！！Ok, I know reason, the moderator doesn't approve it. Sorry , I will paste it

I compiler numpy with MKL, everything is ok. But I come across a peculiar question. I have three case:

case_1(perinfoMKL1): I add only `mkl_rt` lib in site.cfg file
case_2(perinfoMKL2): I add `mkl_intel_lp64, mkl_intel_thread, mkl_core, iomp5, mkl_rt` lib in site.cfg file
case_3(perinfoMKL3): I and `mkl_intel_lp64, mkl_intel_thread, mkl_core, iomp5, mkl_rt, mkl_avx2` in site.cfg file

And then, I compiler it, build and install process are ok, no problems. But when I use case_2, a error is occured: Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so. I setted LD_LIBRARY_PATH, but when I add mkl_avx2 libs and compiler it again after, test ok.

Based on LD_DEBUG, I use LD_DEBUG="files libs" LD_DEBUG_OUTPUT=log ./test.py to display process for input file. I see relocation dependency words, so I guess the problem is related to this. I am not clear in underlying computer inspect, someone can explain it, thanks. I am confused of it a little.

My understanding: Because mkl_rtaccording to relocation meaning, I guess that when group_1(mkl_intel_lp64, mkl_intel_thread, mkl_core, iomp5) and group_2(mkl_rt) exist together, executable program will find a new symbol table rather than specific one libs(such as: group_1)， and some aren't associated with the symbol table to link other symbol table. And then, finally executable program will not find a symbols, because function of libs relocataion again, so it's not find. When I add new lib to compiler again, errored symbols link new symbols successfully.

But according to Intel Forums, someone says that it's bug. So, I am confused of it again:-( [https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/300857]

Environments:
hp computer, CPU——Intel Core i5 4 core，Memory——8G, OS——CentOS 7, miniconda3(Python3.6), numpy 1.13(numpy with mkl, I configure some configuration MKL-information to site.cfg file) I compiled from source code.

Below followings are my process and attention points:

I seted LD_LIBRARY_PATH. My setting: LD_LIBRARY_PATH="/tmp/mkl-nfs/lib" (Since I will make my environment clean and tidy)
Same test code: test.py , and every test-code will be modified suitable Python interpreter path.
Compiler source code is ok for every case.

$ cat test.py
#!/home/yancy/miniconda3/envs/perinfoMKL1/bin/python
# -*- coding: utf-8 -*-

import numpy as np
import time

start_time = time.time()
a = 10 ** 4
A = np.random.random((a, a))
B = np.random.random((a, a))
C = A.dot(B)
print("Time: ", time.time() - start_time)

(perinfoMKL1)$ tail numpy/site.cfg
#[fftw]
#libraries = fftw3
#
# For djbfft, numpy.distutils will look for either djbfft.a or libdjbfft.a . 
#[djbfft]
#include_dirs = /usr/local/djbfft/include
#library_dirs = /usr/local/djbfft/lib
[mkl]
library_dirs = /tmp/mkl-nfs/lib
mkl_libs = mkl_rt


# No error, execute successfully
(perinfoMKL1) $ ./test.py 
Time:  35.454288959503174

(perinfoMKL1)$ grep "libmkl_rt.so" log.27250 
     27250:    file=libmkl_rt.so [0];  needed by /home/yancy/miniconda3/envs/perinfoMKL1/lib/python3.6/site-packages/numpy-1.13.0.dev0+4408f74-py3.6-linux-x86_64.egg/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so [0]
     27250:    find library=libmkl_rt.so [0]; searching
     27250:      trying file=/home/yancy/miniconda3/envs/perinfoMKL1/lib/tls/x86_64/libmkl_rt.so
     27250:      trying file=/home/yancy/miniconda3/envs/perinfoMKL1/lib/tls/libmkl_rt.so
     27250:      trying file=/home/yancy/miniconda3/envs/perinfoMKL1/lib/x86_64/libmkl_rt.so
     27250:      trying file=/home/yancy/miniconda3/envs/perinfoMKL1/lib/libmkl_rt.so
     27250:      trying file=/home/yancy/miniconda3/envs/perinfoMKL1/bin/../lib/libmkl_rt.so
     27250:      trying file=/tmp/mkl-nfs/lib/libmkl_rt.so
     27250:    file=libmkl_rt.so [0];  generating link map
     27250:    calling init: /tmp/mkl-nfs/lib/libmkl_rt.so
     27250:    file=/tmp/mkl-nfs/lib/libmkl_core.so [0];  dynamically loaded by /tmp/mkl-nfs/lib/libmkl_rt.so [0]
     27250:    file=/tmp/mkl-nfs/lib/libiomp5.so [0];  dynamically loaded by /tmp/mkl-nfs/lib/libmkl_rt.so [0]
     27250:    file=/tmp/mkl-nfs/lib/libmkl_intel_thread.so [0];  dynamically loaded by /tmp/mkl-nfs/lib/libmkl_rt.so [0]
     27250:    file=/tmp/mkl-nfs/lib/libmkl_intel_lp64.so [0];  dynamically loaded by /tmp/mkl-nfs/lib/libmkl_rt.so [0]
     27250:    calling fini: /tmp/mkl-nfs/lib/libmkl_rt.so [0]
(perinfoMKL1)]$ grep "libmkl_avx2.so" log.27250 
     27250:    file=/tmp/mkl-nfs/lib/libmkl_avx2.so [0];  dynamically loaded by /tmp/mkl-nfs/lib/libmkl_core.so [0]
     27250:    file=/tmp/mkl-nfs/lib/libmkl_avx2.so [0];  generating link map
     27250:    file=/tmp/mkl-nfs/lib/libmkl_core.so [0];  needed by /tmp/mkl-nfs/lib/libmkl_avx2.so [0] (relocation dependency)
     27250:    calling init: /tmp/mkl-nfs/lib/libmkl_avx2.so
     27250:    opening file=/tmp/mkl-nfs/lib/libmkl_avx2.so [0]; direct_opencount=1
     27250:    file=/tmp/mkl-nfs/lib/libmkl_intel_thread.so [0];  needed by /tmp/mkl-nfs/lib/libmkl_avx2.so [0] (relocation dependency)
     27250:    calling fini: /tmp/mkl-nfs/lib/libmkl_avx2.so [0]

(perinfoMKL2)$ tail numpy/site.cfg
#[fftw]
#libraries = fftw3
#
# For djbfft, numpy.distutils will look for either djbfft.a or libdjbfft.a . 
#[djbfft]
#include_dirs = /usr/local/djbfft/include
#library_dirs = /usr/local/djbfft/lib
[mkl]
library_dirs = /tmp/mkl-nfs/lib
mkl_libs = mkl_intel_lp64, mkl_intel_thread, mkl_core, iomp5, mkl_rt
(perinfoMKL2)$ ./test.py 
Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.


# Error occur, LD_LIBRARY_PATH I setted.
(perinfoMKL2)$ ./test.py 
Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.
(perinfoMKL2)$ grep "libmkl_rt.so" test.log.26855 
     26855:    file=libmkl_rt.so [0];  needed by /home/yancy/miniconda3/envs/perinfoMKL2/lib/python3.6/site-packages/numpy-1.13.0.dev0+4408f74-py3.6-linux-x86_64.egg/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so [0]
     26855:    find library=libmkl_rt.so [0]; searching
     26855:      trying file=/home/yancy/miniconda3/envs/perinfoMKL2/lib/libmkl_rt.so
     26855:      trying file=/home/yancy/miniconda3/envs/perinfoMKL2/bin/../lib/libmkl_rt.so
     26855:      trying file=/tmp/mkl-nfs/lib/libmkl_rt.so
     26855:    file=libmkl_rt.so [0];  generating link map
     26855:    file=/tmp/mkl-nfs/lib/libmkl_intel_lp64.so [0];  needed by /tmp/mkl-nfs/lib/libmkl_rt.so [0] (relocation dependency)
     26855:    file=/tmp/mkl-nfs/lib/libmkl_core.so [0];  needed by /tmp/mkl-nfs/lib/libmkl_rt.so [0] (relocation dependency)
     26855:    calling init: /tmp/mkl-nfs/lib/libmkl_rt.so
     26855:    calling fini: /tmp/mkl-nfs/lib/libmkl_rt.so [0]
(perinfoMKL2) $ grep "libmkl_avx2.so" test.log.26855 
     26855:    file=/tmp/mkl-nfs/lib/libmkl_avx2.so [0];  dynamically loaded by /tmp/mkl-nfs/lib/libmkl_core.so [0]
     26855:    file=/tmp/mkl-nfs/lib/libmkl_avx2.so [0];  generating link map
     26855:    /tmp/mkl-nfs/lib/libmkl_avx2.so: error: symbol lookup error: undefined symbol: mkl_dft_fft_fix_twiddle_table_32f (fatal)
     26855:    file=/tmp/mkl-nfs/lib/libmkl_avx2.so [0];  destroying link map
     26855:    file=/home/yancy/miniconda3/envs/perinfoMKL2/bin/libmkl_avx2.so [0];  dynamically loaded by /tmp/mkl-nfs/lib/libmkl_core.so [0]
     26855:    file=libmkl_avx2.so [0];  dynamically loaded by /tmp/mkl-nfs/lib/libmkl_core.so [0]
     26855:    find library=libmkl_avx2.so [0]; searching
     26855:      trying file=/home/yancy/miniconda3/envs/perinfoMKL2/lib/libmkl_avx2.so
     26855:      trying file=/home/yancy/miniconda3/envs/perinfoMKL2/bin/../lib/libmkl_avx2.so
     26855:      trying file=/tmp/mkl-nfs/lib/libmkl_avx2.so
     26855:    file=libmkl_avx2.so [0];  generating link map
     26855:    /tmp/mkl-nfs/lib/libmkl_avx2.so: error: symbol lookup error: undefined symbol: mkl_dft_fft_fix_twiddle_table_32f (fatal)
     26855:    file=/tmp/mkl-nfs/lib/libmkl_avx2.so [0];  destroying link map

(perinfoMKL3)$ tail numpy/site.cfg

#[fftw]
#libraries = fftw3
#
# For djbfft, numpy.distutils will look for either djbfft.a or libdjbfft.a . 
#[djbfft]
#include_dirs = /usr/local/djbfft/include
#library_dirs = /usr/local/djbfft/lib
[mkl]
library_dirs = /tmp/mkl-nfs/lib
mkl_libs = mkl_intel_lp64, mkl_intel_thread, mkl_core, iomp5, mkl_rt, mkl_avx2

# No error when I add mkl_avx2
(perinfoMKL3)$ ./test.py 
Time:  33.996660232543945

(perinfoMKL3) $ grep "libmkl_rt.so" log.27384 
     27384:    file=libmkl_rt.so [0];  needed by /home/yancy/miniconda3/envs/perinfoMKL3/lib/python3.6/site-packages/numpy-1.13.0.dev0+4408f74-py3.6-linux-x86_64.egg/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so [0]
     27384:    find library=libmkl_rt.so [0]; searching
     27384:      trying file=/home/yancy/miniconda3/envs/perinfoMKL3/lib/libmkl_rt.so
     27384:      trying file=/home/yancy/miniconda3/envs/perinfoMKL3/bin/../lib/libmkl_rt.so
     27384:      trying file=/tmp/mkl-nfs/lib/libmkl_rt.so
     27384:    file=libmkl_rt.so [0];  generating link map
     27384:    file=/tmp/mkl-nfs/lib/libmkl_intel_lp64.so [0];  needed by /tmp/mkl-nfs/lib/libmkl_rt.so [0] (relocation dependency)
     27384:    file=/tmp/mkl-nfs/lib/libmkl_core.so [0];  needed by /tmp/mkl-nfs/lib/libmkl_rt.so [0] (relocation dependency)
     27384:    calling init: /tmp/mkl-nfs/lib/libmkl_rt.so
     27384:    calling fini: /tmp/mkl-nfs/lib/libmkl_rt.so [0]
(perinfoMKL3)$ grep "libmkl.avx2.so" log.27384 
     27384:    file=libmkl_avx2.so [0];  needed by /home/yancy/miniconda3/envs/perinfoMKL3/lib/python3.6/site-packages/numpy-1.13.0.dev0+4408f74-py3.6-linux-x86_64.egg/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so [0]
     27384:    find library=libmkl_avx2.so [0]; searching
     27384:      trying file=/home/yancy/miniconda3/envs/perinfoMKL3/lib/libmkl_avx2.so
     27384:      trying file=/home/yancy/miniconda3/envs/perinfoMKL3/bin/../lib/libmkl_avx2.so
     27384:      trying file=/tmp/mkl-nfs/lib/libmkl_avx2.so
     27384:    file=libmkl_avx2.so [0];  generating link map
     27384:    file=/tmp/mkl-nfs/lib/libmkl_core.so [0];  needed by /tmp/mkl-nfs/lib/libmkl_avx2.so [0] (relocation dependency)
     27384:    file=/tmp/mkl-nfs/lib/libmkl_intel_lp64.so [0];  needed by /tmp/mkl-nfs/lib/libmkl_avx2.so [0] (relocation dependency)
     27384:    file=/tmp/mkl-nfs/lib/libmkl_intel_thread.so [0];  needed by /tmp/mkl-nfs/lib/libmkl_avx2.so [0] (relocation dependency)
     27384:    calling init: /tmp/mkl-nfs/lib/libmkl_avx2.so
     27384:    opening file=/tmp/mkl-nfs/lib/libmkl_avx2.so [0]; direct_opencount=1
     27384:    calling fini: /tmp/mkl-nfs/lib/libmkl_avx2.so [0]

yan_c_ · ‎04-27-2017

Hi，@Gennady F. (Intel) . URL have been updated， is there any suggestion or tips ？？？ Again thanks.