There you go:

may_ka · ‎02-17-2019

Hi

the program below implements the inversion of an autoregressive matrix.

Program Test
  use blas95
  use lapack95
  USE IFPORT
  use mkl_service
  implicit none
  integer(kind=8) :: istat, n, c1, c2, ise
  integer(kind=4) :: dy
  character(len=200) :: msg
  Real(kind=8), allocatable :: A(:,:)
  real(kind=8) :: r1=0.0D0, r2=0.0D0
  outer:block
    dy=1
    write(*,*) "dynamic: ", dy
    call mkl_set_dynamic(dy)
    call mkl_set_num_threads(mkl_get_max_threads())
    n=10000
    write(*,"(*(g0"",""))") n
    r1=dclock()
    !!start building the matrix
    allocate(&
      &A(n,n),&
      &stat=istat,errmsg=msg)
    if(istat/=0) Then
      write(*,*) msg;exit outer
    end if
    !$OMP PARALLEL DO PRIVATE(c1)
    Do c1=1,size(A,2)
      Do c2=c1,size(A,1)
        A(c2,c1)=0.5**(c2-c1)
      end Do
    end Do
    !$OMP END PARALLEL DO
    ise=size(A,1)
    !$OMP PARALLEL DO PRIVATE(c1) FIRSTPRIVATE(ise)
    Do c1=1,ise-1
      A(c1,(c1+1):ise)=A((c1+1):ise,c1)
    End Do
    !$OMP END PARALLEL DO
    r2=Dclock()
    write(*,*) "alloc: ", r2-r1
    !!end building matrix
    r2=Dclock()
    call potrf(A=A,UPLO="U",INFO=istat)
    r1=dclock()
    write(*,*) "potrf: ",r1-r2
    call POTRI(A=A,Info=istat)
    r2=dclock()
    write(*,*) "potri: ",r2-r1
  End block outer
End Program Test

For setting mkl_dynamic to 0 or 1, I noticed hardly any difference in processing time when using mkl 17.08.

mkl_dynamic=0:

potrf: 0.88 seconds, potri: 2.12 seconds

mkl_dynamic=1

potrf: 0.58 seconds, potri: 2.09 seconds

However, with mkl 19.02 the differences are such that mkl_dynamic=0 makes the program unusable.

mkl_dynamic=0:

potrf: 0.37 seconds, potri: 110.69 seconds

mkl_dynamic=1

potrf: 0.37 seconds, potri: 1.11 seconds

Times were obtained on Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz.

Environment variables were:

MKL_NUM_THREADS=36

KMP_AFFINITY=granularity=core,scatter

I noticed that potri in mkl 19.02 use a lot of time all 72 threads (including hyperthreading)

Is this a newly introduced bug or am I doing anything wrong.

Thanks

Gennady_F_Intel · ‎02-17-2019

Is that windows or linux OS? We don't expect to see such huge perf differences between these modes.

may_ka · ‎02-18-2019

Hi,

it is the linux version. It runs on an Arch Linux system kernel version 4.20.8

Cheers

Gennady_F_Intel · ‎02-19-2019

ok, could you check the behavior once more time and give us the mkl verbose output.

may_ka · ‎03-04-2019

There you go:

dynamic:            0
OMP: Warning #181: OMP_PLACES: ignored because KMP_AFFINITY has been defined
OMP: Warning #181: OMP_PROC_BIND: ignored because KMP_AFFINITY has been defined

OPENMP DISPLAY ENVIRONMENT BEGIN
   _OPENMP='201611'
  [host] OMP_AFFINITY_FORMAT='OMP: pid %P tid %T thread %n bound to OS proc set {%a}'
  [host] OMP_ALLOCATOR='omp_default_mem_alloc'
  [host] OMP_CANCELLATION='FALSE'
  [host] OMP_DEBUG='disabled'
  [host] OMP_DEFAULT_DEVICE='-10'
  [host] OMP_DISPLAY_AFFINITY='FALSE'
  [host] OMP_DISPLAY_ENV='TRUE'
  [host] OMP_DYNAMIC='FALSE'
  [host] OMP_MAX_ACTIVE_LEVELS='2147483647'
  [host] OMP_MAX_TASK_PRIORITY='0'
  [host] OMP_NESTED='TRUE'
  [host] OMP_NUM_THREADS='72'
  [host] OMP_PLACES: value is not defined
  [host] OMP_PROC_BIND='intel'
  [host] OMP_SCHEDULE='static'
  [host] OMP_STACKSIZE='2000M'
   OMP_TARGET_OFFLOAD=DEFAULT
  [host] OMP_THREAD_LIMIT='2147483647'
  [host] OMP_TOOL='enabled'
  [host] OMP_TOOL_LIBRARIES: value is not defined
  [host] OMP_WAIT_POLICY='PASSIVE'
OPENMP DISPLAY ENVIRONMENT END


10000,
 alloc:   0.187290906906128     
MKL_VERBOSE Intel(R) MKL 2019.0 Update 2 Product build 20190118 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.30GHz ilp64 intel_thread
MKL_VERBOSE DPOTRF(U,10000,0x14c678cd8240,10000,0) 337.92ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:36
 potrf:   0.444939851760864     
MKL_VERBOSE DPOTRI(U,10000,0x14c678cd8240,10000,0) 106.01s CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:36
 potri:    106.012171030045

Gennady_F_Intel · ‎03-07-2019

I could reproduce such behavior.

$ export MKL_NUM_THREADS=36
$ export KMP_AFFINITY=granularity=core,scatter
$ export MKL_VERBOSE=1
$ ./a_dyn0.out
dynamic: 0
10000,
alloc: 2.26017498970032
MKL_VERBOSE Intel(R) MKL 2019.0 Update 2 Product build 20190118 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.20GHz lp64 intel_thread
MKL_VERBOSE DPOTRF(U,10000,0x2aeb8fc4b280,10000,0) 517.64ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:36
potrf: 1.22820401191711
MKL_VERBOSE DPOTRI(U,10000,0x2aeb8fc4b280,10000,0) 1.66s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:36
potri: 1.66010999679565

Gennady_F_Intel · ‎03-07-2019

forget to add lscpu output

$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 88
On-line CPU(s) list: 0-87
Thread(s) per core: 2
Core(s) per socket: 22
Socket(s): 2
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Stepping: 1
CPU MHz: 2824.250
BogoMIPS: 4395.90
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 28160K
NUMA node0 CPU(s): 0-10,44-54
NUMA node1 CPU(s): 11-21,55-65
NUMA node2 CPU(s): 22-32,66-76
NUMA node3 CPU(s): 33-43,77-87

may_ka · ‎03-07-2019

Sorry, I am getting lost.

Do you mean "I could not reproduce the behavior"??

In your example potrf ran 1.22 seconds, potri 1.66 seconds, whereas in mine it ran 0.44 and 106 seconds, respectively!

Gennady_F_Intel · ‎03-07-2019

You are right,I misprinted. I couldn't reproduce the behavior you reported.

may_ka · ‎03-11-2019

Hi,

do you have any suggestions where to look further. I tried 19.03 but the problem persists. I also tried other cpus:

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       46 bits physical, 48 bits virtual
CPU(s):              72
On-line CPU(s) list: 0-71
Thread(s) per core:  2
Core(s) per socket:  18
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               79
Model name:          Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz
Stepping:            1
CPU MHz:             2793.359
CPU max MHz:         3600.0000
CPU min MHz:         1200.0000
BogoMIPS:            4590.30
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            46080K
NUMA node0 CPU(s):   0-17,36-53
NUMA node1 CPU(s):   18-35,54-71
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                112
On-line CPU(s) list:   0-111
Thread(s) per core:    2
Core(s) per socket:    14
Socket(s):             4
NUMA node(s):          4
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz
Stepping:              4
CPU MHz:               2000.000
BogoMIPS:              4000.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              19712K
NUMA node0 CPU(s):     0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76,80,84,88,92,96,100,104,108
NUMA node1 CPU(s):     1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77,81,85,89,93,97,101,105,109
NUMA node2 CPU(s):     2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74,78,82,86,90,94,98,102,106,110
NUMA node3 CPU(s):     3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71,75,79,83,87,91,95,99,103,107,111
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt mba tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local ibpb ibrs stibp dtherm ida arat pln pts pku ospke spec_ctrl intel_stibp

and

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                56
On-line CPU(s) list:   0-55
Thread(s) per core:    2
Core(s) per socket:    14
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
Stepping:              2
CPU MHz:               3591.250
CPU max MHz:           3600.0000
CPU min MHz:           1200.0000
BogoMIPS:              5199.93
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              35840K
NUMA node0 CPU(s):     0-13,28-41
NUMA node1 CPU(s):     14-27,42-55
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

but the problem persists on all. The first cpu runs an Arch linux system, kernel version 5.0, the two latter cpus a centos 7 system, kernel version 3.10.

Only this cpu

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       39 bits physical, 48 bits virtual
CPU(s):              8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           1
NUMA node(s):        1
Vendor ID:           GenuineIntel
CPU family:          6
Model:               94
Model name:          Intel(R) Core(TM) i7-6820HK CPU @ 2.70GHz
Stepping:            3
CPU MHz:             800.322
CPU max MHz:         3600.0000
CPU min MHz:         800.0000
BogoMIPS:            5426.00
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            8192K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp

is not affected. Note that the last (not affected) cpu and the first (affected) cpu run exactly the same operation system (Arch) and MKL versions.

may_ka · ‎03-14-2019

Hi Intel team,

I could narrow the problem to having set "OMP_NESTED=TRUE" and "MKL_DYNAMIC=0". If that is the case potri gets stuck. If "OMP_NESTED=FALSE" it works:

 dynamic:            0
OMP: Warning #181: OMP_PROC_BIND: ignored because KMP_AFFINITY has been defined
OMP: Warning #181: OMP_PLACES: ignored because KMP_AFFINITY has been defined

OPENMP DISPLAY ENVIRONMENT BEGIN
   _OPENMP='201611'
  [host] OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}'
  [host] OMP_ALLOCATOR='omp_default_mem_alloc'
  [host] OMP_CANCELLATION='FALSE'
  [host] OMP_DEBUG='disabled'
  [host] OMP_DEFAULT_DEVICE='0'
  [host] OMP_DISPLAY_AFFINITY='FALSE'
  [host] OMP_DISPLAY_ENV='TRUE'
  [host] OMP_DYNAMIC='FALSE'
  [host] OMP_MAX_ACTIVE_LEVELS='2147483647'
  [host] OMP_MAX_TASK_PRIORITY='0'
  [host] OMP_NESTED='FALSE'
  [host] OMP_NUM_THREADS='56'
  [host] OMP_PLACES: value is not defined
  [host] OMP_PROC_BIND='intel'
  [host] OMP_SCHEDULE='static'
  [host] OMP_STACKSIZE='2000M'
  [host] OMP_TARGET_OFFLOAD=DEFAULT
  [host] OMP_THREAD_LIMIT='2147483647'
  [host] OMP_TOOL='enabled'
  [host] OMP_TOOL_LIBRARIES: value is not defined
  [host] OMP_WAIT_POLICY='PASSIVE'
OPENMP DISPLAY ENVIRONMENT END


10000,
 alloc:   0.287423133850098     
MKL_VERBOSE Intel(R) MKL 2019.0 Update 3 Product build 20190125 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.60GHz ilp64 intel_thread
MKL_VERBOSE DPOTRF(U,10000,0x2b9248d69200,10000,0) 470.92ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:56
 potrf:    1.38600492477417     
MKL_VERBOSE DPOTRI(U,10000,0x2b9248d69200,10000,0) 1.78s CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:56
 potri:    1.77534294128418

and

 dynamic:            0
OMP: Warning #181: OMP_PROC_BIND: ignored because KMP_AFFINITY has been defined
OMP: Warning #181: OMP_PLACES: ignored because KMP_AFFINITY has been defined

OPENMP DISPLAY ENVIRONMENT BEGIN
   _OPENMP='201611'
  [host] OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}'
  [host] OMP_ALLOCATOR='omp_default_mem_alloc'
  [host] OMP_CANCELLATION='FALSE'
  [host] OMP_DEBUG='disabled'
  [host] OMP_DEFAULT_DEVICE='0'
  [host] OMP_DISPLAY_AFFINITY='FALSE'
  [host] OMP_DISPLAY_ENV='TRUE'
  [host] OMP_DYNAMIC='FALSE'
  [host] OMP_MAX_ACTIVE_LEVELS='2147483647'
  [host] OMP_MAX_TASK_PRIORITY='0'
  [host] OMP_NESTED='TRUE'
  [host] OMP_NUM_THREADS='56'
  [host] OMP_PLACES: value is not defined
  [host] OMP_PROC_BIND='intel'
  [host] OMP_SCHEDULE='static'
  [host] OMP_STACKSIZE='2000M'
  [host] OMP_TARGET_OFFLOAD=DEFAULT
  [host] OMP_THREAD_LIMIT='2147483647'
  [host] OMP_TOOL='enabled'
  [host] OMP_TOOL_LIBRARIES: value is not defined
  [host] OMP_WAIT_POLICY='PASSIVE'
OPENMP DISPLAY ENVIRONMENT END


10000,
 alloc:   0.274131059646606     
MKL_VERBOSE Intel(R) MKL 2019.0 Update 3 Product build 20190125 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.60GHz ilp64 intel_thread
MKL_VERBOSE DPOTRF(U,10000,0x2ae3af9c6200,10000,0) 462.94ms CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:56
 potrf:   0.920413017272949     
MKL_VERBOSE DPOTRI(U,10000,0x2ae3af9c6200,10000,0) 203.79s CNR:OFF Dyn:0 FastMM:1 TID:0  NThr:56
 potri:    203.793864011765

lscpu:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                56
On-line CPU(s) list:   0-55
Thread(s) per core:    2
Core(s) per socket:    14
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
Stepping:              2
CPU MHz:               1511.757
CPU max MHz:           3600.0000
CPU min MHz:           1200.0000
BogoMIPS:              5200.01
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              35840K
NUMA node0 CPU(s):     0-13,28-41
NUMA node1 CPU(s):     14-27,42-55
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

linker for the program:

ifort --version
ifort (IFORT) 19.0.3.199 20190206
Copyright (C) 1985-2019 Intel Corporation.  All rights reserved.

mkdir -p OMP_MKLPARA_ifort_5.0.0-arch1-1-ARCH
ifort -i8 -warn nounused -warn declarations -O3 -static -align array64byte -mkl=parallel -qopenmp -c -o OMP_MKLPARA_ifort_5.0.0-arch1-1-ARCH/Test.o Test.f90 -I /opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/include
ifort -i8 -warn nounused -warn declarations -O3 -static -align array64byte -mkl=parallel -qopenmp -o Test_OMP_MKLPARA_5.0.0-arch1-1-ARCH OMP_MKLPARA_ifort_5.0.0-arch1-1-ARCH/Test.o /opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_blas95_ilp64.a /opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_lapack95_ilp64.a -Wl,--start-group /opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_intel_ilp64.a /opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_core.a /opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_intel_thread.a -Wl,--end-group -lpthread -lm -ldl
ld: /opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_core.a(mkl_semaphore.o): in function `mkl_serv_inspector_suppress':
mkl_semaphore.c:(.text+0x129): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking

But according to this the setting "OMP_NESTED=TRUE" and "MKL_DYNAMIC=0" is exactly that recommended for a nested application. If I am not wrong in this, this seems to be a bug which renders every 19 release unusable.

Gennady_F_Intel · ‎03-18-2019

yes, we confirm the problem with potri in v.2019 u1 and this case is escalated.

Gennady_F_Intel · ‎05-27-2019

Please check the latest MKL 2019 u4 and let us know how if the problem is still threre

may_ka · ‎06-08-2019

Hi,

thanks for getting back. Seems to work with 19.04.

cheers

bad interaction betwenn mkl_dynamic and potri in mkl 19.02???