- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
the program below implements the inversion of an autoregressive matrix.
Program Test use blas95 use lapack95 USE IFPORT use mkl_service implicit none integer(kind=8) :: istat, n, c1, c2, ise integer(kind=4) :: dy character(len=200) :: msg Real(kind=8), allocatable :: A(:,:) real(kind=8) :: r1=0.0D0, r2=0.0D0 outer:block dy=1 write(*,*) "dynamic: ", dy call mkl_set_dynamic(dy) call mkl_set_num_threads(mkl_get_max_threads()) n=10000 write(*,"(*(g0"",""))") n r1=dclock() !!start building the matrix allocate(& &A(n,n),& &stat=istat,errmsg=msg) if(istat/=0) Then write(*,*) msg;exit outer end if !$OMP PARALLEL DO PRIVATE(c1) Do c1=1,size(A,2) Do c2=c1,size(A,1) A(c2,c1)=0.5**(c2-c1) end Do end Do !$OMP END PARALLEL DO ise=size(A,1) !$OMP PARALLEL DO PRIVATE(c1) FIRSTPRIVATE(ise) Do c1=1,ise-1 A(c1,(c1+1):ise)=A((c1+1):ise,c1) End Do !$OMP END PARALLEL DO r2=Dclock() write(*,*) "alloc: ", r2-r1 !!end building matrix r2=Dclock() call potrf(A=A,UPLO="U",INFO=istat) r1=dclock() write(*,*) "potrf: ",r1-r2 call POTRI(A=A,Info=istat) r2=dclock() write(*,*) "potri: ",r2-r1 End block outer End Program Test
For setting mkl_dynamic to 0 or 1, I noticed hardly any difference in processing time when using mkl 17.08.
mkl_dynamic=0:
potrf: 0.88 seconds, potri: 2.12 seconds
mkl_dynamic=1
potrf: 0.58 seconds, potri: 2.09 seconds
However, with mkl 19.02 the differences are such that mkl_dynamic=0 makes the program unusable.
mkl_dynamic=0:
potrf: 0.37 seconds, potri: 110.69 seconds
mkl_dynamic=1
potrf: 0.37 seconds, potri: 1.11 seconds
Times were obtained on Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz.
Environment variables were:
MKL_NUM_THREADS=36
KMP_AFFINITY=granularity=core,scatter
I noticed that potri in mkl 19.02 use a lot of time all 72 threads (including hyperthreading)
Is this a newly introduced bug or am I doing anything wrong.
Thanks
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is that windows or linux OS? We don't expect to see such huge perf differences between these modes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
it is the linux version. It runs on an Arch Linux system kernel version 4.20.8
Cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
ok, could you check the behavior once more time and give us the mkl verbose output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There you go:
dynamic: 0 OMP: Warning #181: OMP_PLACES: ignored because KMP_AFFINITY has been defined OMP: Warning #181: OMP_PROC_BIND: ignored because KMP_AFFINITY has been defined OPENMP DISPLAY ENVIRONMENT BEGIN _OPENMP='201611' [host] OMP_AFFINITY_FORMAT='OMP: pid %P tid %T thread %n bound to OS proc set {%a}' [host] OMP_ALLOCATOR='omp_default_mem_alloc' [host] OMP_CANCELLATION='FALSE' [host] OMP_DEBUG='disabled' [host] OMP_DEFAULT_DEVICE='-10' [host] OMP_DISPLAY_AFFINITY='FALSE' [host] OMP_DISPLAY_ENV='TRUE' [host] OMP_DYNAMIC='FALSE' [host] OMP_MAX_ACTIVE_LEVELS='2147483647' [host] OMP_MAX_TASK_PRIORITY='0' [host] OMP_NESTED='TRUE' [host] OMP_NUM_THREADS='72' [host] OMP_PLACES: value is not defined [host] OMP_PROC_BIND='intel' [host] OMP_SCHEDULE='static' [host] OMP_STACKSIZE='2000M' OMP_TARGET_OFFLOAD=DEFAULT [host] OMP_THREAD_LIMIT='2147483647' [host] OMP_TOOL='enabled' [host] OMP_TOOL_LIBRARIES: value is not defined [host] OMP_WAIT_POLICY='PASSIVE' OPENMP DISPLAY ENVIRONMENT END 10000, alloc: 0.187290906906128 MKL_VERBOSE Intel(R) MKL 2019.0 Update 2 Product build 20190118 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.30GHz ilp64 intel_thread MKL_VERBOSE DPOTRF(U,10000,0x14c678cd8240,10000,0) 337.92ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:36 potrf: 0.444939851760864 MKL_VERBOSE DPOTRI(U,10000,0x14c678cd8240,10000,0) 106.01s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:36 potri: 106.012171030045
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I could reproduce such behavior.
$ export MKL_NUM_THREADS=36
$ export KMP_AFFINITY=granularity=core,scatter
$ export MKL_VERBOSE=1
$ ./a_dyn0.out
dynamic: 0
10000,
alloc: 2.26017498970032
MKL_VERBOSE Intel(R) MKL 2019.0 Update 2 Product build 20190118 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.20GHz lp64 intel_thread
MKL_VERBOSE DPOTRF(U,10000,0x2aeb8fc4b280,10000,0) 517.64ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:36
potrf: 1.22820401191711
MKL_VERBOSE DPOTRI(U,10000,0x2aeb8fc4b280,10000,0) 1.66s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:36
potri: 1.66010999679565
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
forget to add lscpu output
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 88
On-line CPU(s) list: 0-87
Thread(s) per core: 2
Core(s) per socket: 22
Socket(s): 2
NUMA node(s): 4
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
Stepping: 1
CPU MHz: 2824.250
BogoMIPS: 4395.90
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 28160K
NUMA node0 CPU(s): 0-10,44-54
NUMA node1 CPU(s): 11-21,55-65
NUMA node2 CPU(s): 22-32,66-76
NUMA node3 CPU(s): 33-43,77-87
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry, I am getting lost.
Do you mean "I could not reproduce the behavior"??
In your example potrf ran 1.22 seconds, potri 1.66 seconds, whereas in mine it ran 0.44 and 106 seconds, respectively!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You are right,I misprinted. I couldn't reproduce the behavior you reported.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
do you have any suggestions where to look further. I tried 19.03 but the problem persists. I also tried other cpus:
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 46 bits physical, 48 bits virtual CPU(s): 72 On-line CPU(s) list: 0-71 Thread(s) per core: 2 Core(s) per socket: 18 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2697 v4 @ 2.30GHz Stepping: 1 CPU MHz: 2793.359 CPU max MHz: 3600.0000 CPU min MHz: 1200.0000 BogoMIPS: 4590.30 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 46080K NUMA node0 CPU(s): 0-17,36-53 NUMA node1 CPU(s): 18-35,54-71 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 112 On-line CPU(s) list: 0-111 Thread(s) per core: 2 Core(s) per socket: 14 Socket(s): 4 NUMA node(s): 4 Vendor ID: GenuineIntel CPU family: 6 Model: 85 Model name: Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz Stepping: 4 CPU MHz: 2000.000 BogoMIPS: 4000.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 1024K L3 cache: 19712K NUMA node0 CPU(s): 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76,80,84,88,92,96,100,104,108 NUMA node1 CPU(s): 1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77,81,85,89,93,97,101,105,109 NUMA node2 CPU(s): 2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74,78,82,86,90,94,98,102,106,110 NUMA node3 CPU(s): 3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71,75,79,83,87,91,95,99,103,107,111 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt mba tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local ibpb ibrs stibp dtherm ida arat pln pts pku ospke spec_ctrl intel_stibp
and
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 56 On-line CPU(s) list: 0-55 Thread(s) per core: 2 Core(s) per socket: 14 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz Stepping: 2 CPU MHz: 3591.250 CPU max MHz: 3600.0000 CPU min MHz: 1200.0000 BogoMIPS: 5199.93 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 35840K NUMA node0 CPU(s): 0-13,28-41 NUMA node1 CPU(s): 14-27,42-55 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts
but the problem persists on all. The first cpu runs an Arch linux system, kernel version 5.0, the two latter cpus a centos 7 system, kernel version 3.10.
Only this cpu
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 94 Model name: Intel(R) Core(TM) i7-6820HK CPU @ 2.70GHz Stepping: 3 CPU MHz: 800.322 CPU max MHz: 3600.0000 CPU min MHz: 800.0000 BogoMIPS: 5426.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
is not affected. Note that the last (not affected) cpu and the first (affected) cpu run exactly the same operation system (Arch) and MKL versions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Intel team,
I could narrow the problem to having set "OMP_NESTED=TRUE" and "MKL_DYNAMIC=0". If that is the case potri gets stuck. If "OMP_NESTED=FALSE" it works:
dynamic: 0 OMP: Warning #181: OMP_PROC_BIND: ignored because KMP_AFFINITY has been defined OMP: Warning #181: OMP_PLACES: ignored because KMP_AFFINITY has been defined OPENMP DISPLAY ENVIRONMENT BEGIN _OPENMP='201611' [host] OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}' [host] OMP_ALLOCATOR='omp_default_mem_alloc' [host] OMP_CANCELLATION='FALSE' [host] OMP_DEBUG='disabled' [host] OMP_DEFAULT_DEVICE='0' [host] OMP_DISPLAY_AFFINITY='FALSE' [host] OMP_DISPLAY_ENV='TRUE' [host] OMP_DYNAMIC='FALSE' [host] OMP_MAX_ACTIVE_LEVELS='2147483647' [host] OMP_MAX_TASK_PRIORITY='0' [host] OMP_NESTED='FALSE' [host] OMP_NUM_THREADS='56' [host] OMP_PLACES: value is not defined [host] OMP_PROC_BIND='intel' [host] OMP_SCHEDULE='static' [host] OMP_STACKSIZE='2000M' [host] OMP_TARGET_OFFLOAD=DEFAULT [host] OMP_THREAD_LIMIT='2147483647' [host] OMP_TOOL='enabled' [host] OMP_TOOL_LIBRARIES: value is not defined [host] OMP_WAIT_POLICY='PASSIVE' OPENMP DISPLAY ENVIRONMENT END 10000, alloc: 0.287423133850098 MKL_VERBOSE Intel(R) MKL 2019.0 Update 3 Product build 20190125 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.60GHz ilp64 intel_thread MKL_VERBOSE DPOTRF(U,10000,0x2b9248d69200,10000,0) 470.92ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:56 potrf: 1.38600492477417 MKL_VERBOSE DPOTRI(U,10000,0x2b9248d69200,10000,0) 1.78s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:56 potri: 1.77534294128418
and
dynamic: 0 OMP: Warning #181: OMP_PROC_BIND: ignored because KMP_AFFINITY has been defined OMP: Warning #181: OMP_PLACES: ignored because KMP_AFFINITY has been defined OPENMP DISPLAY ENVIRONMENT BEGIN _OPENMP='201611' [host] OMP_AFFINITY_FORMAT='OMP: pid %P tid %i thread %n bound to OS proc set {%A}' [host] OMP_ALLOCATOR='omp_default_mem_alloc' [host] OMP_CANCELLATION='FALSE' [host] OMP_DEBUG='disabled' [host] OMP_DEFAULT_DEVICE='0' [host] OMP_DISPLAY_AFFINITY='FALSE' [host] OMP_DISPLAY_ENV='TRUE' [host] OMP_DYNAMIC='FALSE' [host] OMP_MAX_ACTIVE_LEVELS='2147483647' [host] OMP_MAX_TASK_PRIORITY='0' [host] OMP_NESTED='TRUE' [host] OMP_NUM_THREADS='56' [host] OMP_PLACES: value is not defined [host] OMP_PROC_BIND='intel' [host] OMP_SCHEDULE='static' [host] OMP_STACKSIZE='2000M' [host] OMP_TARGET_OFFLOAD=DEFAULT [host] OMP_THREAD_LIMIT='2147483647' [host] OMP_TOOL='enabled' [host] OMP_TOOL_LIBRARIES: value is not defined [host] OMP_WAIT_POLICY='PASSIVE' OPENMP DISPLAY ENVIRONMENT END 10000, alloc: 0.274131059646606 MKL_VERBOSE Intel(R) MKL 2019.0 Update 3 Product build 20190125 for Intel(R) 64 architecture Intel(R) Advanced Vector Extensions 2 (Intel(R) AVX2) enabled processors, Lnx 2.60GHz ilp64 intel_thread MKL_VERBOSE DPOTRF(U,10000,0x2ae3af9c6200,10000,0) 462.94ms CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:56 potrf: 0.920413017272949 MKL_VERBOSE DPOTRI(U,10000,0x2ae3af9c6200,10000,0) 203.79s CNR:OFF Dyn:0 FastMM:1 TID:0 NThr:56 potri: 203.793864011765
lscpu:
Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 56 On-line CPU(s) list: 0-55 Thread(s) per core: 2 Core(s) per socket: 14 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 63 Model name: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz Stepping: 2 CPU MHz: 1511.757 CPU max MHz: 3600.0000 CPU min MHz: 1200.0000 BogoMIPS: 5200.01 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 35840K NUMA node0 CPU(s): 0-13,28-41 NUMA node1 CPU(s): 14-27,42-55 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm abm epb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts
linker for the program:
ifort --version ifort (IFORT) 19.0.3.199 20190206 Copyright (C) 1985-2019 Intel Corporation. All rights reserved. mkdir -p OMP_MKLPARA_ifort_5.0.0-arch1-1-ARCH ifort -i8 -warn nounused -warn declarations -O3 -static -align array64byte -mkl=parallel -qopenmp -c -o OMP_MKLPARA_ifort_5.0.0-arch1-1-ARCH/Test.o Test.f90 -I /opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/include ifort -i8 -warn nounused -warn declarations -O3 -static -align array64byte -mkl=parallel -qopenmp -o Test_OMP_MKLPARA_5.0.0-arch1-1-ARCH OMP_MKLPARA_ifort_5.0.0-arch1-1-ARCH/Test.o /opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_blas95_ilp64.a /opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_lapack95_ilp64.a -Wl,--start-group /opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_intel_ilp64.a /opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_core.a /opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_intel_thread.a -Wl,--end-group -lpthread -lm -ldl ld: /opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/lib/intel64/libmkl_core.a(mkl_semaphore.o): in function `mkl_serv_inspector_suppress': mkl_semaphore.c:(.text+0x129): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
But according to this the setting "OMP_NESTED=TRUE" and "MKL_DYNAMIC=0" is exactly that recommended for a nested application. If I am not wrong in this, this seems to be a bug which renders every 19 release unusable.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
yes, we confirm the problem with potri in v.2019 u1 and this case is escalated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please check the latest MKL 2019 u4 and let us know how if the problem is still threre
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
thanks for getting back. Seems to work with 19.04.
cheers
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page